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ABSTRACT 


A  model  of  categorical  perception  is  presented  in  which  the  mapping 
between  a  physical  acoustic  continuum  and  the  corresponding  perceptual 
continuum  is  assumed  to  be  nonlinear.  It  is  proposed  that  such  non¬ 
linearity  is  sufficient  to  account  for  many,  although  not  necessarily  all, 
cases  of  bserved  categrical  perception.  A  new  experimental  paradigm  is 
described  for  testing  this  model  of  categorical  perception.  Stimuli 
are  constructed  by  adding  together  the  waveforms  of  two  speech  signals 
with  relative  weights  a  and  B  such  that  the  composite  waveform  s'  is 
described  by  s'  =  as^  +  Bs^,  where  s^  and  s^  are  the  formant  transitions 
from  two  initial  or  final  stop  consonants,  and  B=  1-a.  For  a=  0,  the 
composite  stimulus  has  the  identity  of  s^,  and  for  a=  1  it  has  the  identity 
of  s  .  For  intermediate  values  of  a ,  a  sharp  transition  between  the  two 
phonetic  categories  is  observed.  Identification  and  discrimination  tests 
show  that  this  relative  intensity  continuum  is  categorically  perceived, 
and  the  discrimination  model  derived  earlier  is  shown  to  adequately  account 
for  the  observed  discrimiantion  results.  Results  of  a  selective  adaptation 
test  are  also  presented  in  which  the  composition  of  the  adaptor  is  systematically 
varied  along  this  relative  intensity  continuum.  The  results  show  that  the 
boundary  shifts  are  a  strong  function  of  the  acoustic  makeup  of  the  adaptor. 
Dichotic  listening  results  are  also  presented  in  which  the  effect  of  inter- 
aural  intensity  and  stimulus  composition  are  simultaneously  investigated. 

The  results  indicate  that  this  experimental  paradigm  should  be  useful  for 
studying  ear  dominance.  Lastly,  a  tentative  model  based  on  power  law 
excitation  of  neural  populations  is  investigated  in  an  attempt  to  unify 
the  results  of  the  identification,  discrimination,  selective  adaptation 
and  dichotic  listening  tests. 
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CHAPTER  1 


INTRODUCTION 


1 .  1  BACKGROUND  TO  CATEGORICAL  PERCEPTION 

Categorical  perception  in  speech  was  originally 
char acterized  by  Liberman,  Harris,  Hoffman  and  Griffith 
(1  957)  in  a  classic  experiment  in  which  it  was  shown  that 
the  d iscriminability  of  a  series  of  synthetic  /b/-/ d/-/g/ 
stimuli  was  poorer  within  phonetic  categories  than  between 
categories.  The  outcome  of  this  study  is  well  known,  and 
established  a  methodology  for  investigating  categorical 
perception  which  has  lasted  two  decades.  Liberman  et  al . 
(1957)  constructed  a  preliminary  model  of  the  ABX 
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discr iminat ion  results  based  on  the  extreme  assumption  that 
the  ability  to  discriminate  between  stimuli  was  strictly  a 
result  of  overt  phonetic  classif icat ions  of  stimuli.  This 
model  became  the  standard  "test"  for  categorical  perception: 
if  the  discrimination  results  could  be  adequately  predicted 
by  this  model,  then  the  continuum  was  "categorically 
perceived"  (Studdert-Kennedy,  Liberman,  Harris  and  Cooper, 
197C).  In  the  intervening  years  since  the  Liberman  et  al. 
study,  various  continua,  both  speech  and  nonspeech,  have 
been  shown  to  be  categorically  perceived.  However,  this 
proliferation  of  categorized  continua  has  not  brought  with 
it  a  deeper  understanding  of  the  nature  of  categorical 
perception  itself. 

Disc ri mi  nation  studies  invariably  showed  that  the  data 
and  the  Liberman  et  al.  model  consistently  differed: 
discrimination,  as  measured  by  the  ABX  paradigm,  was  better 
for  within-category  stimuli  than  was  predicted  on  the  basis 
of  category  labelling.  Fujisaki  and  Kawashima  (  1969,  1970) 
extended  the  ABX  discrimination  model  to  admit  an  element  of 
auditory  discrimination,  i.e.,  discrimination  based  on 
auditory  and  not  phonetic  characteristics  of  the  signal. 
Their  model  added  an  additional  parameter  to  the  Liberman  et 
al.  model  (which  has  come  to  be  known  as  the  "Haskin's 
model") ,  and  consequently  improved  the  fit  between  model  and 
data.  The  assumption  underlying  their  model  was  that 
auditory  memory  decays  faster  than  phonetic  memory,  with 
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phonetic  categorization  playing  the  major  role.  The 
Fujisaki  and  Kawashiraa  model,  while  improving  the  fit 
between  the  model  and  the  data,  did  not  substantially 
increase  the  understanding  of  the  phenomenon  itself,  since 
the  model  did  not  reguire  as  input  any  physical 
speci fication  of  the  stimuli  which  were  being 
discriminated. 

The  criteria  for  the  demonstration  of  categorical 
perception  were  canonized  by  Studdert-Kennedy  et  al. 

(1970).  In  their  formulation,  the  discrimination  results 
were  reguired  to  show  an  enhanced  peak  at  the  phoneme 
boundary  (the  "phoneme  boundary  effect") ,  and  this  peak  had 
to  be  predicted  on  the  basis  of  the  labelling 
probabilities.  Using  these  criteria,  various  researchers 
were  subsequently  able  to  show  that  various  nonspeech 
continua  were  also  categorically  perceived  (e.g.,  Locke  and 
Kellar,  1873;  Cutting  and  Kosner,  1974;  Pastore,  Ahroon, 
Eaffutc,  Friedman,  Puleo  and  Fink,  1976).  The  belief  now 
commonly  held  is  that  categorical  perception  may  be  a 
phenomenon  characterizable  at  the  psychophysical  level  in 
which  the  acoustic  structure  of  the  stimuli  plays  a  direct 
role  (Pastore  et  al,  1976;  Miller,  Wier,  Pastore,  Kelly  and 
Pooling,  1976;  Carney,  Widin  and  Viemeister,  1977;  Cutting 
and  Eosner,  1974) .  Part  of  this  change  of  view  evidently 
stems  from  the  demonstration  of  extant  "categories"  in  neo¬ 
nates  (Eimas,  Siqueland,  Juszyck  and  Vigorito,  1971;  Morse, 
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1972;  Kuhl  and  Miller  1975a)  and  animals  (Kuhl  and  Miller, 
1975b).  The  question  which  arises  from  these  studies  is 
whether  or  not  categorical  perception  results  from  natural 
perceptual  boundaries  or  learned  category  boundaries. 
Evidence  exists  to  show  that  both  may  be  involved.  The 
demonstrations  with  chinchillas  indicate  that  certain  speech 
stimuli  (e.g.,  /d/  and  /t/)  are  sufficiently  far  apart  in 
perceptual  space  as  to  be  readily  associated  with  events  in 
the  environment  (Kuhl  and  Miller,  1975b),  suggesting  that 
/d/  and  /t/  are  "natural"  categories.  On  the  other  hand, 
Miyawaki,  Strange,  Verbrugge,  Liberman,  Jenkins  and  Fujimura 
(1975),  on  the  basis  of  /r/-/l/  distinctions  of  adult 
Japanese  and  Americans,  show  that  the  /r/-/l/  continuum  is 
categorically  perceived  by  English  subjects  but  not  so 
perceived  by  adult  Japanese.  The  implication  of  this  study 
is  that  some  instances  of  categorical  perception  in  speech 
may  be  attributable  to  learned  distinctions.  There  is  no 
evidence  to  date  which  clearly  suggests  that  all  instances 
of  categorically  perceived  continua  are  attributable  to  the 
same  underlying  mechanisms- 

Several  mechanisms  for  categorical  perception  recently 
have  been  proposed  which  are  plausible  in  their  description 
but  rather  vague  in  their  formulation.  Miller  et  al.  (1966) 
suggest  that  it  is 


"...  a  single  component  or  a  stimulus  complex  that 
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is  the  variable.  It  is  likely  that  the  unchanged  or 
constant  part  of  the  stimulus  complex  provides  an 
immediate  stimulus  context  against  which  the  effects 
of  the  changed  component  are  judged."  (p.  415) 


while  Pastore  et  al.  (1976)  suggest  that  categorical 
perception  is  due  to 


"  ...  a  single,  sharp,  stable  dichotomy  or 
limitation  along  a  dimension  causes  both  a  natural 
tendency  to  form  a  category  boundary  and,  at  the 
same  time,  improves  the  precision  of  the  information 
used  in  discriminating  stimuli  separated  by  the 
dichotomy  or  limitation."  (p.694) 


However,  neither  Miller  et  al.  nor  Pastore  et  al.  formalize 
their  models,  which  makes  them  difficult  to  test.  Their 
intent  is  obvious,  however:  categorical  perception  may 
reflect  processing  limitations  of  the  sensory  systems.  This 
appears  especially  true  for  the  Pastore  et  al.  (1976) 
experiment  in  which  critical  flicker  fusion  (CFF)  was  shown 
to  be  ca tegorically  perceived  along  the  freguency-of-f licker 
dimension. 

Categorical  perception,  then,  remains  an  enigma.  It  is 
fairly  easily  demonstrated,  perhaps  too  easily,  but  has  yet 
to  be  explained  in  any  psychophysical  sense-  Khile  various 
proposals,  such  as  rapid  decay  of  auditory  information  with 
slow  decay  of  phonetic  information  (Fujisaki  and  Kawashima, 
1969;  Pisoni,  1975),  fixed  signal  components  (Miller  et  al., 
1976)  or  a  "stable  dichotomy"  (Pastore  et  al. ,  1976),  are 


6 


reasonable,  they  have  not  been  formalized  into  models  which 
require  as  input  some  physical  variable  associated  with  the 
st  imuli. 

1 . 2  CA1EG0EICAL  PERCEPTION  AND  SELECTIVE  ADAPTATION 

Since  Eimas  and  Corbit  (1973)  it  has  been  often 
suggested  that  extraction  of  phonetic  information  from 
speech  is  mediated  by  neural  constructs  called  ''feature 
detectors".  These  feature  detectors  presumably  span  a 
stimulus  continuum1,  and  are  characterized  by  a  response 
function  which  represents  the  sensitivity  of  the  detector  to 
stimuli  along  this  continuum.  Under  repeated  presentation 
of  a  stimulus  drawn  from  this  continuum,  the  detector  is 
thought  to  become  desensitized,  with  the  result  that  the 
stimulus  value  for  which  the  two  detector  outputs  are  equal 
shifts  in  the  direction  of  the  adapting  stimulus.  This 
shift  has  been  taken  as  support  for  the  notion  that  the 
continuum  is  spanned  by  two  separate  neural  entities  which 
selectively  respond  to  stimuli  along  the  continuum.  As  in 
the  case  of  categorical  perception,  interpretation  of 
phonetic  boundary  shifts  is  complicated  for  want  of  a 
quantitative  model.  Kith  the  exception  of  Elman  (1979),  no 
explicit  formulation  of  a  detector  model  has  yet  been 


1  A  "continuum"  is  a  physical  parameter  of  the  stimuli 
which,  when  varied  over  a  certain  range,  causes  a  change  in 
percept  from  one  phonetic  category  to  another. 
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constructed-  Elman* s  model  assumes  two  Gaussian  detector 
response  functions,  and  with  this  model  he  shows  that  it  is 
possible  to  account  for  boundary  shifts  by  a  change  in  the 
subject’s  response  bias-  However,  the  model  has  not  been 
investigated  in  sufficient  detail  to  show  that  this  is  the 
only  way  in  which  the  model  can  account  for  these  shifts. 

Now,  what  is  the  relation  of  selective  adaptation  to 
categorical  perception?  These  two  phenomena  have 
traditionally  formed  two  separate  lines  of  research  in  the 
speech  perception  literature,  but  it  seems  clear  that  they 
must  be  related-  Cooper  (1974)  investigated  the  change  in 
ABX  discrimination  under  adaptaion,  and  his  results  indicate 
that  the  peaks  of  the  discrimination  curves  shift  in 
accordance  with  the  shifts  in  the  category  boundaries  of  the 
labelling  curves.  This  suggests  that  (a)  identification  and 
discrimination  involve  the  same  physiological/perceptual 
mechanisms,  and  (b)  selective  adaptation  affects  this  system 
in  such  a  way  that  the  peak  of  discriminability  tends  to 
follow  the  category  boundary-  (It  is  not  certain  from  his 
results  that  the  discrimination  peak  and  the  labelling 
boundary  always  coincide,  but  this  is  a  distinct 
possibility) -  It  remains  to  be  shown,  however,  that  a  two- 
detector  configuration  spanning  some  physical  continuum  can 
simultaneously  account  for  categorical  perception  of  the 
continuum  as  well  as  the  boundary  shifts  under  selective 
adaptation. 


The  detector  theory  of  speech  perception  received 
considerable  impetus  from  selective  adaptation  studies,  but 
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the  concept  of  "detector"  itself  has  a  rather  uncertain 
status.  Usually  it  reduces  to  little  more  than  a  graphical 
construct  to  aid  in  the  interpretation  of  results  (e.g.. 
Miller  1977;  Ainsworth,  1977).  The  failure  to  give 
mathematical  form  to  the  theory  weakens  it  rather  than 
stengthens  it,  since  arguments  for  and  against  this 
interpretation  of  category  boundary  shifts  reduce  to 
exercises  in  verbal  logic.  The  reluctance  to  formalize  the 
detector  model  perhaps  stems  from  the  fear  of  increasing  the 
vulnerability  of  the  model  by  making  it  more  explicit.  Even 
though  detectors  are  at  present  little  more  than 
"physiological  metaphors"  (Simon  and  Studder t- Kennedy , 

1978) ,  a  translation  into  a  mathematical  metaphor  is 
certainly  desirable.  The  quality  of  the  metaphor  is  then 
relatable  to  how  well  it  can  quantify  the  phenomena  it  is 
supposed  to  explain. 

1 • 3  THE  PRESENT  RESEARCH 

This  sets  the  stage  for  the  present  research.  First, 
various  models  of  discrimination  (and  hence  categorical 
perception)  are  discussed  it  Chapter  2,  and  lead  to  a 
formulation  of  a  signal  detection  theory  (SDT)  model  of 
discrimination.  This  model  is  based  strictly  on  auditory 
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discr imitation,  and  centers  around  the  concept  of 
"d ispersion" ,  which  reflects  a  non-linear  mapping  between 
the  physical  acoustic  continuum  and  the  corresponding 
perceptual  continuum.  It  is  shown  that  a  detector  model 
along  the  lines  of  Eimas  and  Corbitt  (1973)  and  as 
formulated  by  Elman  (1979)  is  a  dispersive  system,  and 
theoretically  can  account  for  the  categorical  perception  of 
a  continuum.  It  is  also  shown  that  such  a  detector 
configuration  simultaneously  can  account  for  phonetic 
boundary  shifts  under  adaptation.  (This  is  the  same  model 
which  Elman  uses  to  support  a  response  bias  account  of 
selective  adaptation) . 

In  Chapter  3  a  new  experimental  paradigm  is  described 
for  investigating  categorical  perception  and  various  other 
speech  phenomena.  Briefly,  two  CV  syllables  (e-g.,  /bae/ 
and  /dae/)  are  mixed  together  by  adding  their  digitized 
waveforms.  A  categorically  perceived  continuum  is  then 
formed  when  the  relative  intensities  of  the  two  component 
signals  is  varied.  Various  experiments  are  described  which 
investigate  the  generality  of  this  form  of  categorical 
perception.  In  Chapter  4,  a  selective  adaptation  experiment 
is  described  in  which  boundary  shifts  are  shown  to  be  a 
strong  function  of  the  relative  intensities  of  the  /bae/  and 
/dae/  components  of  the  adaptor.  These  results  suggest  that 
/b/  and  /d/  detectors  are  effected  independently  and 
simultaneously  by  the  two  components  of  the  signal. 
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In  Chapter  5/  a  model  of  this  "monaural  fusion" 
paradigm  is  constructed  which  assumes  that  the  /b/  and  /d/ 
processors  are  functionally  separate,  i.e.,  the  /b/ 
component  of  the  stimulus  is  recognized  by  a  processor  which 
recognizes  /b/,  and  the  /d/  component  is  recognized  by  a 
processor  which  recognizes  /d/.  Following  the  basic 
framework  laid  out  in  Chapter  2,  the  model  is  extended  to 
account  for  discrimination  and  selective  adaptation,  and 
suggests  that  for  all  intents  and  purposes,  the  two  signal 
components  behave  as  if  they  were  presented  over  separate 
auditory  channels.  A  binaural  extension  of  the  model  in 
Chapter  6  is  used  to  interpret  the  results  of  a  dichotic 
listening  experiment,  and  the  application  of  the  model  to 
these  results  suggests  tha  there  is  only  a  slight  coupling 
of  the  /t/  and  /d/  processors.  The  four  experimental 
paradigms  -  identificaton,  discrimination,  selective 
adaptation  and  binaural  fusion  are  thus  shown  to  be 
interpretable  by  a  single  model,  a  direct  consequence  of  the 
fact  that  this  relative  intensity  continuum  is  categorically 
perceived. 


CHAPTER  2 


MODELS  OF  CATEGORICAL  PERCEPTION 


2 . 1  PHONETIC-MEMORY  MODELS 

Categorical  perception  is  defined  by  two  observable 
measures:  the  identification  function  (representing  a 
subject’s  ability  to  label  stimuli  which  differ  along  some 
physical  continuum)  ,  and  the  discrimination  function 
(representing  a  subject's  ability  to  discriminate  between 
stimuli  drawn  from  that  continuum) .  The  criteria  for 
demonstration  of  categorical  perception  are  stated  in  terms 
of  these  two  psychometric  functions  (Studder t-Kennedy  et 
al.,  1970),  and  the  "test"  for  categorical  perception  is  how 
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well  discrimination  can  be  predicted  from  the  corresponding 
identification  curve.  The  Kashins  model  (see  Chapter  1) 
has  been  shown  to  generally  underpredict  the 
discriirinability  of  most  speech  ccntinua,  and  has  been 
modified  to  include  a  measure  of  auditory  discriminabili ty 
by  Fujisaki  and  Kawashima  (1969,  1970).  Both  of  these 
models  assume  that  discrimination  of  speech  stimuli 
presumably  results  from  an  overt  phonetic  classification  by 
the  subject,  and  will  be  referred  to  hereafter  as  "phonetic 
memory  models". 

In  the  Liberman  et  al.  (1957)  model  of  ABX 
discrimination,  the  results  of  an  identification,  or 
labelling,  test  are  used  as  a  posteriori  estimates  of  the 
probability  that  a  subject  will  perceive  a  stimulus  as 
belonging  to  one  of  the  phonetic  categories.  The  expected 
discrimination  scores  are  then  computed  by  enumerating  the 
various  response  probabilities  for  the  ABX  test  paradigm 
using  these  labelling  probabilities  (see  MacMillan  et  al . , 
1977  ;  Pollack  and  Pisoni,  1971).  For  instance,  if  p:  and  p9 
are  the  probabilities  that  two  stimuli  y  and  x?  contrasted 
in  the  ABX1  paradigm  are  classified  by  the  subject  as,  say, 
category  B,  then  the  predicted  discrimination  score  is  given 
by 


1  In  the  ABX  paradigm  under  discussion,  X  is  always  either  A 
or  B- 
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%  =  0-5[1  +  (px  ~P2  )2]  (2-1) 

Assuming  the  identification  function  shows  perfect  within 
category  labelling,  the  probability  of  discriminating  two 
within-category  stimuli  near  the  endpoints  of  the  continuum 

(i.e.  p  =p  -0  or  p  =p  -=1)  is 

1  2  rl  2 

P  =0-5  (2-2) 

That  is,  discrimination  should  occur  at  a  strictly  chance 
level.  Experimental  results  show,  however,  that  some 
within-category  discrimination  is  possible.  To  accommodate 
this  disparity,  Fujisaki  and  Kawashima  (1970)  extended  the 
Haskin's  model  by  positing  a  two-tier  discrimination 
process.  Discrimination  between  stimuli,  as  in  the  Haskin  s 
model,  is  considered  to  be  an  operation  primarily  involving 
explicit  phonetic  categorization.  If  a  subject  perceives 
the  stimuli  in  contrasting  phonetic  categories,  he  responds 
accordingly.  However,  if  the  subject  does  not  perceive  the 
stimuli  in  contrasting  categories,  he  then  attempts  to 
discriminate  by  comparing  the  "timbres"  of  the  auditory 
images.  The  measure  of  auditory  discriminability  is 
represented  in  the  model  by  a  "guessing  factor"  T.  The 
resulting  equation  is  (Fujisaki  and  Kawashima,  1970) 
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?D  =  0- 5[  (P1  -P2 )  =  +  P1(1-p2)  +  P2(1-P1)] 

+  T[Pj  P2  +  (1-Pj  )  (1-P2  )  ]  (2-3) 

When  T=0.5,  this  Fujisaki  and  Kawashima  model2  reduces  to 
the  Haskins  model.  These  models  have  been  used  extensively 
to  predict  discriminability  in  studies  of  categorical 
perception.  Since  ABX  discrimination  experiments  invariably 
show  greater  than  chance  discriminability  for  within- 
category  stimuli,  the  FK  model  shows  a  superior  fit  since  1 
corresponds  essentially  to  a  "d-c  shift"  of  the  predicted 
discrimination  curve. 

2. 1. 1  The  Effect  of  Step-size  on  Discrimination 

It  is  traditional  in  ABX  discrimination  studies  to 
perform  the  experiment  using  "one-step"  and/or  "two-step" 
intervals.  The  companion  identification  test  is  carried  out 
using  a  set  of  stimuli  x^  ,  i  =  1,2,3,  n,  where  all 

stimuli  are  separated  by  Ax  along  the  physical  stimulus 
continuum.  The  ABX  test  is  carried  out  using  stimulus  pairs 
separated  by  either  Ax  or  2  A  x.  The  two-step  test 
invariably  shows  an  overall  increase  in  discriminability 
over  the  one-step  test,  as  well  as  broader  peaks- 


2  Hereafter  referred  to  as  the  "FK"  model. 
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The  influence  of  step  size  on  discriminability  has  not 
recieved  a  great  deal  of  attention  in  the  literature.  The 
notions  of  "one-step"  or  "two-step"  are  not  sufficiently 
well-defined  to  constitute  any  sort  of  standard  since  the 
step  size  itself  is  arbitrary.  To  see  the  influence  of  step 
size,  Equation  2-3  above  can  be  calculated  for  all 
combinations  of  two  stimuli  x  and  x  separated  by  a 
constant  amount  ax.  For  purposes  of  illustration,  the 
identification  is  assumed  to  be  given  by  the  normal  ogive 


P  (x) 


x  -  0  .  5 

a 


(2-4) 


where  x  is  a  hypothetical  stimulus  continuum  ranging  from  0 
to  1  (see  Fig.  2.1)  and  p  (x)  is  the  probability  that  a 
stimulus  x  will  be  classified  as,  say,  category  B-  is 

the  normal  cumulative  distribution  function.  The  width  of 
the  transition  region  (as  characterized  by  a)  is 
arbitrarily  set  at  a =0.05.  Using  the  identification 
function  given  by  Equation  2-4  and  a  step  size  of  0.05, 

P  (x  ,  x2 )  can  be  computed  for  all  possible  pairings  of  the 
stimuli.  3  The  result  is  a  three-dimensional  response  surface 
as  shown  in  Fig.  2.2.  The  two  dimensions  in  the  horizontal 


3  For  the  rest  of  the  discussion,  the  quantity  F  lx  ,x^) 
will  be  referred  to  as  the  "discrimination  function"  and 
p  ( x)  will  be  referred  to  as  the  "identification  function"  or 
"labelling  function". 
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CATEGORY  A 


CATEGORY  B 


Fig.  2.1.  Hypothetical  identification  function  (normal  ogive 
with  mean  0.5  and  standard  deviation  0.05)  used  in 
the  calculation  of  the  discrimination  functions 
described  in  the  text 
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aP 


Fig.  2.2.  Calculated  response  surfaces  pD(x  ,x  )  for  the  ABX  paradigm.  (a)  is  the 

Haskins  model  (T=0)  and  (b)  is  tne  Fujisaki  and  Kawashima  model  with  T=0.2. 
The  dashed  lines  represent  the  zero  step  size  condition,  x  =  x 
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plane  represent  the  positions  of  x^  and  x^  along  the  x- 

continuum,  and  the  vertical  axis  (P  )  represents  the 

probability  of  discriminating  x  and  x  Profiles  of 

1  2  * 

constant  x  ^  represent  a  hypothetical  fixed-standard  ABX 
test,  while  sections  through  this  surface  parallel  to  the 
diagonal  dashed  line  represent  a  variable-standard  fixed 
step  size  ABX  test  (i.e.,  the  usual  ABX  paradigm).  The  FK 
model  (Equation  2-3)  predicts  that  in  case  the  two  stimuli 
being  contrasted  are  physically  identical,  better  than 
chance  discriminability  is  predicted  (for  T  >  0) .  In  the 
vicinity  of  the  boundary  it  decreases  slightly  (see  Fig.  2.3 
b) .  Another  novel  feature  is  the  slight  dip  on  each  side  of 
the  peak  of  the  discrimination  curve  (Fig.  2.3b). 

This  model  of  the  ABX  discrimination  process,  while 
capturing  the  general  shape  of  observed  discrimination 
curves,  suffers  from  a  rather  undesirable  limiting 
behaviour.  It  seems  counter-intuitive  that  physically 
identical  stimuli  can  be  discriminated  at  better  than  chance 
level.  Furthermore,  that  the  discrimination  curve  should 
drop  at  the  phonetic  boundary  for  small  step  sizes  is  also 
disconcerting.  Only  the  Haskins  model  (Fig.  2.3a) 
demonstrates  the  correct  limiting  behaviour  since  for  zero 
step  size,  strictly  chance  discriminability  is  predicted  at 
all  points  along  the  continuum.  These  properties  of  the  FK 
model  have  remained  obscured  since  virtually  all 
applications  of  the  model  to  date  have  involved  the  use  of 
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Fig.  2.3.  Calculated  ABX  discrimination  functions  for  various  step 
sizes,  Ax.  (a)  Haskins  model  (b)  Fujisaki  and  Kawashima 
model  with  T=0.2 
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ex  peri  cert  ally  determined  relative  frequencies.  Since 
identification  functions  are  sampled  only  at  a  few  intervals 
along  the  continuum  (typically  only  10  or  so)  and  often  show 
considerable  statistical  scatter,  predicted  discrimination 
curves  do  not  show  these  minor  effects. 

2.  1. 2  The  AX  Discrimination  Test 

The  ABX  paradigm  unquestionably  has  been  the  most 
popular  discrimination  test  in  studies  on  categorical 
perception.  Until  recently,  the  AX  paradigm  has  received 
less  attention.  The  reasons  for  its  lack  of  popularity  are 
unclear,  but  Zinnes  and  Kurtz  (1968)  attribute  it  to 

"...  the  very  old  belief  that  the  ’same'  or  'equal* 
category  in  discrimination  experiments  is  too 
unstable,  too  easily  influenced  by  the  subject,  and 
as  such  gives  a  poor  measure  of  a  subject's  optimum 
discriminability .  "  (p.  39  2)- 

However,  recent  studies  on  categorical  perception  have 
tended  to  favour  the  AX  discrimination  test  (e. g..  Repp  et 
al.,  1978;  Carney  et  al.,  1977;  Williams,  1977;  Wood,  1976; 
Cutting,  B.osner  and  Foard,  1  976;  Pisoni,  1973). 

Following  the  strategy  for  the  FK  model,  a  phonetic* 
memory  model  of  AX  discriminability  can  be  computed  (cf. 
Pollack  and  Pisoni,  1971;  Zinnes  and  Kurtz,  1968).  Assume 
that  the  probability  that  two  stimuli  x  and  x2  will  be 
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categorized  as  belonging  to,  say,  category  B  are  p1  =  p ( ) 
and  p2  =  p(x2).  The  subject  may  then  perceive  these  two 
stimuli  as  AA,  AB,  BA  or  BB-  If  he  perceives  either  AB  or 
BA,  he  will  presumably  respond  "different”,  and  if  he 
perceives  either  AA  or  BB  he  will  be  forced  to  discriminate 
on  the  basis  of  non-phonetic  differences  and,  as  in  the  case 
of  the  IK  model,  will  respond  "different"  with  probability 
T.  The  factor  T  therefore  incorporates  the  true  within 
category  disc riminability  as  well  as  the  subject’s  response 
bias.  Assuming  egual  a  priori  probabilities  of  presenting 
x,  and  in  either  order  (x  x  or  x  x  )  ,  the  various 
response  probabilities  are  as  shown  in  Table  2-1  below. 


TABLE  2-1 

BESPONSE  PBOBAB III TIES  FOB  THE  2IAX  PABADIGM 

PEBCEIVED  STIMULUS 

CATEGOBY 


A  A 
A  B 
E  A 
B  B 


x 

1 


X 

2 


P  P 

1  2 


<i-p2) 

o-Pj ) 

)  <i-p2> 


x  x 
2  1 

P  P 
2  1 

P  (1-P  ) 

2  1 

P1  (  1  'P2  ’ 
(1*P2)  (1-Pl 


) 


Since  stimulus  combinations  x  x  and  x  x  occur  with  egual 

12  2  1 

frequency,  the  resulting  proportion  of  "different"  responses 


is 
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Pn  =  P  (1"P0  )  +  P0  (1-P  ) 

u  1  2  2  1 

+  T[p  p  ♦  (1-p  )  (1-p  )  ] 

12  1  2 


(2-5) 


This  equation  is  similar  in  form  to  Equation  2-3  and  has 
similar  properties.  In  this  case,  T  is  more  readily 
interpreted  as  a  criterion  which  can  be  manipulated  by  the 
subject,  as  well  as  a  factor  which  reflects  enhanced 
discriminability  for  within-categor y  comparisons.  If  the 
subject  chooses  to  ignore  subtle  differences  between  stimuli 
or,  alternatively,  is  unable  to  perceive  any  differences, 
then  T=0  and  Equation  2-5  becomes 


(2-6) 


P  =  P  (1-P  )  ♦  P  (1-P  ) 

D  1  2  2  1 


which  is  the  result  derived  by  Zinnes  and  Kurtz  (1  968, 
p.  397) . 


This  model  of  the  AX  discrimination  test  also  shows 
incorrect  limiting  behaviour.  When  the  two  stimuli  are 


(2-7) 


which  predicts  non-zero  discriminability  between  physically 
identical  stimuli.  This  can  be  seen  in  Fig-  2.4  where 


Equation  2-5  is  calculated  for  all  combinations  of  x  and 
x  •  Sections  through  the  response  surface  for  constant  step 
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Fig.  2.4.  Calculated  discrimination  response  surfaces  for  the  AX  paradigm,  (a)  represents 
the  case  of  T=0  (i.e.,  purely  phonetic  discrimination).  (b)  represents  T=0.5. 
The  dashed  lines  represent  the  zero  step  size  condition,  x  =  x 
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size  (i.e.,  sections  corresponding  to  x  =  x  +  ax)  are 

2  1 

shown  in  Fig.  2.5.  This  model  also  shows  the  curious 
prediction  that  when  the  standard  stimlus  is  the  boundary 
stimulus  (x  =  0.5  in  this  case),  the  discr iminability  is 
constant  across  the  entire  continuum.  That  is,  when  p(x  )  = 
0.5  Equation  2-7  reduces  to 

Pd  =  0.5(1  ♦  T)  (2-8) 

which  is  independent  of  the  test  stimulus  or,  for  that 
matter,  the  entire  test  continuum. 

Few  experimental  data  are  available  to  ccnfirm  or 
disconfirm  predictions  of  this  model.  Variable  standard 
(fixed  step  size)  AX  tests  generally  show  a  prominent  peak 
in  the  vicinity  of  the  phonetic  boundary  (cf .  Foreit,  1977  ; 
Hanson,  1977),  as  this  model  predicts.  With  the  exception 
of  Carney  et  al.  (1977),  fixed-standard  AX  discrimination 
data  have  not  been  presented  in  detail  in  the  literature. 

The  Carney  et  al.  data,  however,  show  a  considerable 
departure  from  the  results  predicted  by  the  AX  phonetic 
memory  model  (Equation  2-5).  Carney  et  al.  hypothesized 
that  for  a  standard  stimulus  intermediate  to  the  end-point 
and  the  boundary  the  discrimination  curve  ought  to  appear  as 
shown  in  Fig.  2.6,  and  their  results  tend  to  support  this 
view.  These  data,  although  limited,  indicate  a  major 
failing  of  the  phonetic-memory  AX  discrimination  model. 
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Fig.  2.5.  Calculated  AX  discrimination  response  functions  for 
various  step  sizes,  Ax.  (a)  T=0.  (b)  T=0.5 
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Fig.  2.6.  Hypothesized  AX  discrimination  function  for  the 

fixed-standard  paradigm  (after  Carney  et  al . ,  1977). 
The  heavy  solid  line  represents  what  would  be 
perfectly  categorical  discrimination. 
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Taken  together  with  the  improper  limiting  behaviour  for  zero 
step  size,  it  seems  quite  clear  that  this  model  cannot  be 
entirely  correct  in  its  formulation. 

2.1.3  Phonetic  vs.  Auditory  Discrimination 

Whereas  it  was  originally  held  that  categorical 
perception  was  entirely  a  property  of  speech  (see  Pastore, 
1976,  for  a  historical  overview),  in  recent  years  the 
tendency  more  and  more  has  been  to  view  categorical 
perception  as  resulting  from  psychophysical  processes  which 
may  have  little  or  nothing  to  do  with  the  fact  that  the 
stimuli  are  speech-like-  Carney  et  al.  (1977),  summarizing 
the  results  of  their  VOT  study,  suggest  that 

"...  auditory  rather  than  exclusively  phonetic 

explanations  are  the  more  appropriate."  (p.  969) . 

Wood  (1976)  concludes  that  since  his  subjects  did  not 
overtly  recognize  short  /pa/  and  /b a/  stimuli  as  linguistic 
sounds  and  yet  could  successfully  discriminate  between  them, 
the  role  of  phonetic  categorization  in  the  determination  of 
the  "phoneme  boundary  effect"  has  perhaps  been  overplayed. 
Categorical  perception  experiments  involving  non-speech 
continua  (e-g..  Miller  et  al. ,  1976;  Pastore  et  al.,  1976; 
Cutting  and  Rosner,  1974;  Locke  and  Kellar,  1973)  as  well  as 
"phonetic  categorization"  by  chinchillas  (Kuhl  and  Miller, 
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1975)  also  suggest  that  categorical  perception  may  reflect  a 
more  fundamental  psychophysical  phenomenon- 

Few  attempts  at  mathematical  modelling  of  categorical 
perception  have  been  carried  out  which  assume  auditory 
discr iminability  rather  than  phonetic  discriminability , 
although  various  suggestions  for  models  have  been  made 
(Miller  et  al. ,  1976;  Pastore  et  al.,  1976;  Anderson  et  al. , 
1977).  Anderson  et  al.  (1977)  present  a  neural  model  which 
has  behaviour  reminiscent  of  the  observed  results  in 
discrimination  studies.  The  complexity  cf  their  model, 
however,  makes  it  difficult  to  apply  to  speech  data.  Miller 
et  al.  (1976)  espouse  the  notion  that  there  is  something  in 
the  signal  itself  which  results  in  categorical  perception. 
Specifically,  a  stimulus  from  a  categorically  perceived 
continuum  is  supposed  to  contain  a  signal  component  against 
which  the  rest  of  the  signal  is  compared.  Categorical 
perception  is  viewed  as  a  result  of  a  masked  threshold 
created  by  this  contrast  of  signal  components.  Pastore  et 
al.  (1976)  adopt  a  somewhat  similar  point  of  view,  and 
propose  that  categorical  perception  arises  from  a 

"...  common  factor  [which]  involves  either  an 
internal  (e.g.,  sensory  threshold)  or  an  external 
(e.  g.  ,  a  reference  or  interfering  stimulus) 
limitation,  which  is  both  stable  and  more  precisely 
defined  than  the  typical  differential  sensory  aspect 
(i.e.,  difference  limen)  of  the  continuum  under 
investigation."  (p.  687) 


* 
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However,  they  do  not  formalize  their  proposal,  and  it  is  not 
immediately  apparent  how  this  model  is  to  be  implemented. 

2. 2  SIGNAL  DETECTION  MODELS  OF  CATEGORICAL  PERCEPTION 

The  application  of  signal  detection  theory  (SDT)  to  the 
specific  problem  of  categorical  perception  has  been  almost 
completely  neglected,  the  work  by  MacMillan  et  al.  (1977) 
being  a  notable  exception.  As  yet,  no  AX  or  ABX  model  has 
been  presented  in  the  speech  perception  literature  as  a 
replacement  for  the  Haskins/  FK  model  of  discrimination. 
Consequently ,  in  this  section,  a  model  of  the  AX 
discrimination  process4  will  be  developed  which  utilizes  the 
familiar  concepts  of  signal  detection  theory  (Green  and 
Sw  ets ,  1966)  . 

The  phonetic-mem  cry  models  described  above  assume  an 
all-or-none  kind  of  perception  similar  to  low  threshold 
theory  (MacMillan  et  al.,  1977).  This  corresponds  to 
assuming  that  the  perception  of  a  signal  results  in  an 
internal  discrete  random  variable  Y  assuming  a  value  of 
either  0  or  1 ,  where  0  corresponds  to  one  signal  category 
and  1  to  the  other.  The  alternative  possibility  is  to  let  Y 
be  a  continuous  random  variable.  In  this  case,  the  physical 


4  A  model  for  the  AEX  test  will  not  be  attempted  since  more 
than  one  possible  subject  strategy  is  possible  (see 
MacMillan  et  al.  ,  1977;  Pollack  and  Pisoni,  1971;  Pierce  and 
Gilbert,  1  958)  . 
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continuum  x  is  mapped  onto  a  perceptual  continuum  y  by  the 
fu  ncticn 


Y  =  g  (x)  +  e(0, o2) 


(2-9) 


where  y  =  E(Y)  =  g(x)  and  e  is  a  normally  distributed  noise 
component  of  zero  mean  and  variance  a2.  For  the  present 
purposes,  the  mapping  y=g (x)  will  be  assumed  to  be  linear, 
i.e.,  g(x)=x.5  If  the  perceptual  continuum  y  is  divided  into 
two  regions  by  a  criterion  y  ,  then  the  probability  of  an 
arbitrary  stimulus  being  identified  as,  say,  category  B 
(i-e. ,  Y  >  y>  is 


P(Y>y  )  =  /<£>( y,£)  d£ 


(2-10) 


The  discrimination  task  can  be  modelled  by  computing 
the  probability  that  Y  and  Y0  will  be  separated  by  some 

criterion  Ay  .  The  derivation  of  this  model  is  straight 

c 

forward  (see  Zinnes  and  Kurtz,  1968;  Zinnes  and  Wolfe,  1977; 
Sorkin,  1962)  : 


5  This  assumption,  even  if  trivial,  is  necessary.  Parameter 
x  is  a  physical  control  variable  which,  if  changed,  causes  a 
physical  change  in  the  stimulus.  This  change  in  the 
stimulus  may  or  may  not  cause  a  corresponding  change  in  the 
sensation  variable  y,  depending  on  the  function  y=g  (x)  .  In 
the  analysis  which  follows,  the  independent  variable  will  be 
shown  as  y,  and  it  is  implied  that  y=x. 


■ 
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pD  =  1  -$ 


>WAyc 

2  a 


$ 


y2 yrAy< 


2  a 


(2-11) 


where  (J)  is  the  cumulative  normal  distribution.  (Pierce 
and  Gilbert,  1S58,  derived  a  similar  model  for  AX 

discrimination,  but  only  computed  P  ( (y  -y  )  >a y  )  rather 

D  1  2  c 

than  P  ( | y  -y  | >A yc ) .  For  purposes  of  comparison  with  the 
previously  derived  phonetic-memory  model,  y  is  assumed  to 
range  from  0  to  1  and  the  probability  of  discriminating  two 
stimuli  y  and  y  can  be  computed  from  Equation  2-1 1.6  The 
result  of  so  doing  is  shown  in  Fig.  2.7  for  two  values  of 


Ay, 


This  model  clearly  cannot  account  for  AX 


discrimination  along  a  categorized  continuum  since  it  fails 

to  demonstrate  any  enhanced  discriminability  in  the  vicinity 

of  the  boundary  (x  =  0.5).  Along  any  line  x  =  x  +  a*#  it 

1  2 

can  be  seen  that  the  predicted  proportion  of  "different" 
responses  is  constant.  As  the  criterion  A  y^  is  changed,  the 
entire  level  of  "different"  responses  increases  or  decreases 
by  the  same  amount  everywhere  along  this  line.  This  model 
reflects  "continuous"  rather  than  "categ orical"  perception. 

2.  2.  1  Dispersion 


The  principal  failing  of  this  model  is  the  assumption 
that  y=g  (x)  is  a  linear  mapping  of  the  physical  stimulus 


6  Since  it  was  assumed  above  that  y  =  x,  the  integration  can 
be  performed  using  y  ranging  from  0  to  1  instead  of  x  from  0 
to  1 . 
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SDT  model  of  AX  discrimination  for  the  case  of  constant  dispersion, 
(a)  Ay  =  1.0.  (b)  Ay  =  0.25. 
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dimension  x  onto  the  perceptual  dimension  y.  This  assumed 
linearity,  y=x,  results  in 


D(x)  =  dy/dx  =  1 


(2-12) 


Stated  otherwise,  the  dispersion  dy/dx  is  constant-7  Now, 
the  existence  of  a  peak  in  the  discrimination  curve  in 
categorical  perception  studies  indicates  that  the  dispersion 
is  not  constant.  Nithin-category  stimuli  are  mapped  onto 
the  perceptual  dimension  y  such  that  the  distance  between 
them  is  small,  whereas  stimuli  near  the  boundary  are 
separated  by  larger  perceptual  distances.  In  other  words, 
the  dispersion  is  greater  in  the  vicinity  of  the  phonetic 
boundary,  and  this  enhanced  dispersion  effectively  defines 
the  boundary.  lor  an  arbitrary  dispersion  function  D(x)  , 
the  position  of  a  stimulus  x  along  the  perceptual  dimension 
y  is 


7  "Dispersion"  will  be  defined  as  the  rate  at  which  the 
perceptual  variable  y  changes  with  respect  to  the  physical 
variable  x.  As  a  physical  analogy,  consider  a  ray  of  light 
of  wavelength  A  passing  through  a  refractive  medium.  The 
optical  dispersion  of  the  medium  is  given  by  dN/dA ,  where  N 
is  the  refractive  index.  If  y  is  the  position  of  the  beam 
after  passing  through  the  medium,  then  because  of  the 
dispersion,  y  will  change  according  to  how  fast  N  changes 
with  A  .  A  medium  for  which  dN/dA  is  non-zero  is  called  a 
"dispersive  medium". 


X 
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y  = 


fj 


DU)  d£ 


(2-13) 


o 

where  K  is  a  dummy  variable  of  integration  (representing 

distance  along  the  physical  dimension  x)  and  x  is  some 

o 

convenient  reference  point.  Cast  in  these  terms,  the 
enhancement  of  discriminability  at  a  phoneme  boundary  (or 
other  perceptual  boundary)  can  be  represented  by  a  peak  in 
an  underlying  dispersion  function.  A  peak  in  the  dispersion 

function  at  x  will  map  all  values  of  x  <  x  onto  one  end  of 

o  o 

the  y  scale,  and  all  values  of  x  >  x  onto  the  other  end  of 

o 

the  y  scale.  T  his  results  in  a  spreading  of  the  y-dimension 
with  respect  to  the  x-continuum  in  the  vicinity  of  the 
catagory  boundary,  as  is  illustrated  in  Fig.  2.8b  for  the 
Gaussian  dispersion  function  shown  in  Fig.  2.8a.  (This 
choice  of  function  is  without  theoretical  import  and  is  used 
for  illustrative  purposes  only).  Fig.  2.8c  shows  the  same 
transformation  in  another  form.  It  can  be  seen  that  there 
is  a  progression  from  one  "state"  to  the  other,  with  the 
steepness  of  the  transition  being  inversely  proportional  to 
the  width  of  the  underlying  dispersion  function. 


As  the  width  of  the  dispersion  function  decreases  to 
zero,  the  dispersion  function  approaches  a  delta  function 
and  y  (x)  becomes  a  unit  step  function.  This  represents 
perfect  categorization,  since  y  can  only  take  on  values  of  0 
or  1-  Thus,  purely  phonetic  categorization  corresponds  to 
infinite  dispersion  at  the  boundary.  This  is  one  extreme 
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D(x) 


(b) 


y 


Fig.  2.8. 


(a)  Gaussian  dispersion  function.  (b)  Effect  of 
dispersion  on  the  sensation  continuum  y. 

(c)  The  x+y  mapping  caused  by  the  dispersion  (a) 
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property  of  the  model.  Another  extreme  occurs  when  the 
dispersion  is  constant  (already  discussed  above) ,  and  this 
corresponds  to  continuous  perception.  But,  and  this  is  the 
important  point,  depending  on  the  width  and  shape  of  the 
dispersion  function,  various  degrees  of  "categorical 
perception"  are  possible.  It  is  therefore  probably 
meaningless  to  insist  on  a  dichotomous  distinction  between 
"categorical"  perception  and  "continuous"  perception,  since 
perfect  examples  of  either  have  yet  to  be  found.  It  is 
probably  more  reasonable  to  view  some  continua  as  either 
"more  categorical"  or  "less  categorical"  than  others. 
Dispersion  is  a  physical  property  of  any  signal  processing 
system,  although  in  most  non-biological  systems  it  is 
designed  to  be  constant  (i.e.,  some  physical  parameter  maps 
linearly  onto  some  measurable  quantity,  e.g.,  voltage).6 * 8 
The  argument  should  not  be  whether  or  not  dispersion  exists, 
but  rather  how  much  dispersion  exists. 

Suggesting  dispersion  as  a  property  of  the  receptive 
medium  captures  conceptually  and  mathematically  what  has 
been  observed  in  discrimination  studies  all  along:  the 
organism  does  not  respond  equally  to  stimuli  taken  from 
different  points  on  a  continuum.  Expressed  differently,  the 

6  To  continue  the  analogy  with  optical  dispersion, 

dispersion  may  result  from  physical  properties  of  the  medium 
(e.g.,  refraction  by  a  prism),  or  as  a  result  of  the 
properties  of  a  particular  device  (e.g.,  a  diffraction 

grating)  . 
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sensation  variable  Y  associated  with  a  stimulus  x  in  general 
need  net  change  linearly  with  x.  The  nonlinear  mapping 
between  Y  and  x  shown  in  Fig-  2.8b  evidently  appears  to  be 
the  kind  of  non-linearity  between  "acoustics  and  perception" 
which  Elman  (1977)  suggests,  and  the  "stable  dichotomy"  to 
which  Fastore  et  al.  (1976)  refer,  and  also  Ade*  s  (1977)  , 
"Type  1"  effect. 

2.2.2  Dispersion  and  AX  Discrimination 

The  basic  SDT  model  of  discrimination  has  already  been 
given  (Equation  2-11).  To  incorporate  dispersion,  the 
integration  is  carried  out  using  y  as  the  variable  of 
integration,  where  y  and  x  are  related  by  Equation  2-13. 

Once  D  (x)  is  specified,  the  corresponding  model  for  AX 
discrimination  can  be  computed.  The  class  of  dispersion 
functions  of  interest  at  the  moment  are  unimodal  and  for 
illustration  purposes  a  Gaussian  (Fig-  2.8a)  is  convenient: 

(x-x  )2 
o 

D(x)  =  - - -  e  2o2  (2-14) 

27 T  a  U 

D 

Using  Equations  2-11,  2-13  and  2-14  the  AX  discrimination 
function  can  be  calculated  for  all  values  of  x1  and  •  Tlie 
results  for  various  observer  criteria,  A yc ,  and  dispersion 
widths,  0^  r  are  shown  in  Fig-  2-9-  Comparing  these  surfaces 
with  Fig.  2. 4  a  general  similarity  is  seen,  particularly  for 
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PD 


Fig.  2.9.  SDT  model  of  AX  discrimination,  assuming  an  underlying  Gaussian 

dispersion  function.  <T  is  the  same  for  all  six  figures  (a  =0.1). 
(a)  Ay  =0.5,  a  =  0.  (b)  Ay  =0.5,  a  =  0.2.  (c)  Ay  =0.1, 

Op  =  6.05.  (d)  Ay  =0.1,  Ccjp  =  0.2.  (e)  perfectly  categorical 

discrimination  (small  a  ) .  (f)  continuous  discrimination 

(large  Op) 
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high  dispersion  (characterized  by  a  small  value  of  aD  ) •  As 
is  decreased,  P  (x)  approaches  a  delta  function  and 
perception  becomes  perfectly  categorical,  as  shown  in 
Fig.  2.9e.  While  this  is  not  physically  plausible,  it  is 
certainly  the  desired  limiting  behaviour  of  the  model.  In 
the  other  extreme,  as  the  width  of  the  dispersion  function 
increases,  D(x)  becomes  approximately  constant  over  the 
range  of  x,  and  perception  becomes  "continuous"  (compare 
Fig.  2.9f  with  Fig*  2.7). 


The  behaviour  of  this  model  for  zero  or  small  step 
sizes  is  guite  different  from  that  for  the  phonetic  memory 
model.  In  the  present  case  (see  Fig.  2.10),  for  x^  =  x2  , 
the  discrimination  function  has  a  constant  value  (i.e.,  the 
number  of  false  alarms)  dependent  only  on  the  observer 
criterion  a  Jc  : 


P 


D 


2(1 


r  . 

Ayc 

7T7  3 


(2-15) 


This  is  true  even  in  the  limit  of  the  dispersion  function 
becoming  a  delta  function,  in  which  case  the  discrimination 
function  has  a  singularity  at  x1=x2=0.5  (see  Fig.  2-9e).  As 
the  step  size  is  increased,  a  peak  in  the  discrimination 
function  appears  and  broadens  in  much  the  same  fashion  as 
Fig.  2.5. 


In  a  fixed-standard  comparison  in  which  the  boundary 
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Fig.  2.10.  AX  discrimination  functions  for  the  dispersion  model, 

for  various  step  sizes  Ax.  (a)  Ay  =  0.5.  (b)  Ay  =  0.05. 

c  c 
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stimulus  is  chosen  as  the  standard,  the  discrimination 
function  shows  a  dip  at  the  boundary,  and  increases 
monot onically  with  distance  away  from  the  boundary.  (This 
is  to  be  contrasted  with  the  behaviour  of  the  phonetic- 
memory  AX  model  which  predicted  that  the  discrimination 
would  be  constant) .  This  is  intuitively  satisfying,  since  a 
boundary  stimulus  is  a  boundary  stimulus  not  because  it  is 
an  equally  good  exemplar  of  either  category,  but  because  it 
is  an  egually  poor  exemplar,  and  hence  it  should  be 
distinguishable  from  good  exemplars  of  either  category. 

It  also  follows  from  the  phonetic-memory  model 
(Equation  2-5)  that  when  T=0  and  the  standard  stimulus  is 
one  of  the  end-point  stimuli  (i.e.,  p7;=0)  the  discrimination 
function  is  identical  to  the  identification  function.  The 
present  model  predicts  this  behaviour  only  when  the  width  of 
the  dispersion  function  is  sufficiently  narrow,  and  the 
criterion  Ayc  is  large  (see  Fig.  2.11).  for  small  A  yc  ,  the 
number  of  false  alarms  increases,  and  the  inflection  point 
of  the  discrimination  curve  shifts  towards  the  position  of 
the  standard  stimulus.  When  A yc  is  made  very  large,  the 
numter  of  "different”  responses  in  the  catagory  opposite 
from  that  of  the  standard  decreases  uniformly.  This  is  to 
be  expected,  since  a  large  A  y^  corresponds  to  a  subject 
responding  "same"  when  the  AX  stimuli  are  perceived  in 
clearly  opposite  phonetic  categories.  If  little  or  no 
discrimination  within  categories  is  possible,  then  on  the 


1.0 


Fig.  2.11.  Comparison  of  variable  step  size  AX  discrimination 
curves  when  the  standard  stimulus  is  x  =  0. 

(a)  phonetic  memory  model  (Equation  2-5).  The  various 
curves  are  differentiated  by  different  values  of  T. 

(b)  dispersion  model  (Equations  2-11,  2-13  and  2-14). 
The  parameters  of  the  curves  are  Ay  ,  the  observer 
criterion.  In  both  (a)  and  (b) ,  a  is  0.05.  The 
dashed  line  represents  the  corresponding  identification 
function  (i.e.,  Equation  2-4) 
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average  he  may  respond  " sale"  to  any  stimulus  drawn  from 
this  category.  The  degree  of  categorization  of  the 
continuum  is  indicated  by  the  extent  to  which  the 
discri  irination  curve  can  be  "pushed  "  past  the  identification 
curve . 

In  summary,  this  "auditory-dispersion"  model  of  the 
discrimination  process  predicts  the  following  effects: 

(a)  for  a  zero  step  size,  discrimination  will  be 
constant  at  a  value  dependent  on  the 

criterion  Ay  of  the  observer 

c 

(b)  for  a  unimodal  dispersion  function  D(x)  the 
discrimination  function  for  the  constant  step 
size  paradigm  is  also  unimodal 

(c)  the  number  of  "different"  judgements  for  within- 
category  stimuli  is  dependent  both  on  the 
observer  criterion  and  within-ca teg ory 

disc  ri min ability 

(d)  for  a  fixed-standard  test,  when  an  endpoint 
stimulus  is  used  as  the  standard,  the 
resulting  discrimination  curve  will  be  ogival 
in  shape  and  shifted  from  the  labelling 
boundary  by  an  amount  determined  by  the 
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observer  same-different  criterion.  Ay  (as 

c 

shown  in  Fig.  2-1 1b) 

(e)  a  unimodal  dispersion  function  will  result  in  a 
’'categorization’1  of  the  continuum,  and  the 
width  of  the  transition  region  of  the 
identification  function  will  reflect  both  the 
variance  of  the  internal  noise  associated 
with  the  signal  transformation  (Eguation 
-9),  and  the  width  of  the  underlying 
dispersion  function 

The  dispersion  AX  model  shows  quite  different  predictions 
with  respect  to  the  number  of  false  alarms,  the  number  of 
"different"  judgements  which  result  when  the  stimuli  are 
physically  identical.  The  number  of  false  alarms  depends  on 
the  observer  criterion  Ayc  and  is  given  by  Eguation  2-15 
above.  However,  this  condition  can  result  if  D(x)  becomes 
zero,  since  y  and  y^  corresponding  to  two  physically 

different  stimuli  x  and  x  will  be  equal.  Thus,  to  the 

1  2 

observer,  there  is  no  difference  between  two  stimuli  which 
are  physically  identical  and  two  stimuli  which  are 
physically  different  as  long  as  they  map  in  both  cases  onto 
the  same  value  of  y.  It  follows  that  for  a  fixed  criterion 

Ay  / 

c 
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P  (x  ,x )  >  P  (x  ,  x  )  (2-16) 

D12  ^D11  v  ’ 

for  arbitrary  stimuli  x  and  x  .  This  is  equivalent  to 

12 

saying  that  two  stimuli  cannot  be  more  perceptually  similar 
than  when  they  are  physically  identical. 

2.2.3  Finding  a  Dispersion  Function 

The  primary  deficiency  of  the  above  model  is  the  ad  hoc 
specification  of  the  dispersion  function.  Since  it  cannot 
be  directly  observed,  it  must  be  inferred  by  fitting  the 
data  to  a  model  in  which  some  explicit  form  of  D(x)  is 
assumed.  A  preferable  approach  is  to  arrive  at  a 
theoretical  equation  for  the  dispersion  curve,  in  which  case 
Equations  2-11  and  2-13  apply  directly.  Failing  that,  a 
reasonable  alternative  is  to  choose  some  function  which  has 
the  desired  attributes,  and  use  the  model  to  extract 
estimates  of  the  parameters.  This  results  in  a  curve¬ 
fitting  model,  but  one  which  at  least  will  allow 
parameterization  of  the  perceptual  continuum.  A  Gaussian 
may  serve  as  a  suitably  flexible  choice  for  a  first 
approximation,  but  there  is  no  theoretical  motivation  for 
this  function.  In  any  event,  given  a  dispersion  function 
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with  width  9,  the  degree  of  categorization  of  a  continuum 
can  be  related  to  the  dispersion  power  of  the  system,  and  a 
suitable  index  would  be10 

x.  -  x 
b  a 

e  =  -  (2-17) 

3 

where  x  and  x  represent  the  endpoint  stimuli  and  3  is  the 

a  b 

width  of  the  dispersion  curve  at  half  height.  Continuous 
perception  is  then  represented  by  e  =0  and  perfect 
categorical  perception  by  e  =  «.  . 


2.3  A  IETECTOE  MODEL  OF  CATEGORICAL  PERCEPTION 

It  is  a  common  proposal  in  speech  perception  studies 
that  decoding  of  the  speech  signal  is  mediated  by  acoustic 
and/or  phonetic  "feature  detectors".  The  major  support  for 
this  theory  comes  from  selective  adaptation  studies,  where 
it  is  proposed  that  shifts  in  the  labelling  curves  observed 
after  repeated  presentation  of  a  stimulus  results  from  a  de¬ 
sensitization  or  fatiguing  of  one  of  the  detectors  which 
span  the  stimulus  continuum  (e.g.,  Eimas  and  Corbit,  1973; 


9  Since  the  dispersion  function  is  not  necessarily  Gaussian, 
it  is  better  to  use  the  full  width  at  half  height  (i-e. , 
width  at  half-height)  as  a  measure  of  the  width  of  the 
dispersion  peak. 

10  Fujisaki  and  Kawashima  (1970)  use  the  index  A  x/aD  to 
characterize  the  fixed  step  size  ABX  discrimination  curve, 
but  since  this  involves  the  step  size  Ax,  it  is  a  measure 
of  the  paradigm  and  not  of  the  continuum. 


. 


47 


Miller,  1975;  Ainsworth,  1977)-  The  general  results  of  the 
many  selective  adaptation  studies  have  not  required 
rejection  of  this  view,  although  it  is  not  universally 
accepted  that  this  is  the  appropriate  explanation  for  the 
boundary  shifts  (Simon  and  S tuddert-Kennedy,  1978;  Elman, 
1977).  Part  of  the  difficulty  of  using  the  detector 
construct  as  a  basis  for  the  theory  of  selective  adaptation 
is  its  lack  of  specificity.  With  the  exception  of  Elman 
(1977)  ,  few  computations  have  been  performed  to  date  to 
investigate  what  properties  such  a  pair  of  detectors  might 
have.  Elman,  investigating  the  possibility  that  the 
observed  phonetic  boundary  shifts  could  be  accounted  for  by 
changes  in  observer  criterion,  proposes  the  following 
detector  model: 

(a)  two  detectors  span  the  physical  stimulus 
continuum 

(b)  the  detector  response  functions  are  Gaussian 

(c)  the  outputs  of  the  detectors,  u^  and  u  ,  are 
compared  at  a  higher  level 

(d)  the  phoneme  boundary  is  defined  by  u1  =  u2 
In  this  section,  an  analysis  of  such  a  two-detector 
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configuration  is  undertaken.  Consider  two  detectors  with 
response  functions  given  by 


u  i  =  h  x (y ) 


(2-1 8a) 


u2  =  h  2  (y) 


(2-  1  8b) 


where  y  is  the  perceptual  dimensicn  corresponding  to  the 
physical  continuum  x.  The  detector  outputs,  Ux  and  U 2,  will 
be  assumed  to  be  normally  distributed  random  variables  with 
equal  variance  a  (as  in  Equation  2-9). 11  Given  a 
transformation  y=g(x)  between  the  physical  control  variable 
x  (the  physical  parameter  which  defines  the  continuum)  and 
y,  the  perceptual  continuum  which  the  detectors  span,  the 
detector  outputs  are 


ux  =  hl  (g(x)  )  =  H x  (x ) 


(2-  1 9a) 


an  d 


u  2  =  h  2  ( g  (x)  )  =  H  2  ( x ) 


(2-  19b) 


For  the  present  investigation,  it  will  be  assumed  that  the 


11  The  assumption  of  equal  variance  is  net  necessary,  but  in 
the  interests  of  mathematical  tr actability ,  it  will  be 
assumed. 
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trivial  relation  y=x  holds  (i.  e.  ,  dispersion  is  constant), 
in  which  case  u  =  h(x) .  Assuming,  as  does  Elman 
(1977,  1979),  that  h}  (x)  and  h2  (x)  are  Gaussian12, 


C  x  -  x  2_q  ) 


u1  (x) 


2a  2 


y~2 


e 


tt  a 


D 


(x-x20) 


u2  (x) 


/  2 


e  2  a 


tt  a 


D 


D 


(2-20a) 


(2- 20b) 


where  aD  is  the  standard  deviation  of  the  detector  response 
function  and  ^  and  x7q  are  the  locations  of  maximum 
sensitivity  of  the  response  functions.  Two  Gaussian 
detector  response  functions  are  shown  in  Fig.  2.12,  where 
the  x  continuum  ranges  from  0  to  I.13 

As  a  graphical  construct  for  analyzing  the  behaviour  of 
such  a  system,  the  variables  and  U9  for  a  given  x  will  be 


12  The  analysis  which  follows  does  not  reguire  that  the 
detector  functions  be  Gaussian  or  even  unimodal.  However, 
for  computational  purposes,  some  specific  form  is  required 
and  a  Gaussian  is  a  convenient  choice.  Inasmuch  as  Elman 
(1977)  performed  his  computations  using  Gaussian  functions, 
and  suggestions  have  been  made  that  the  response  functions 
are  possibly  Gaussian  (e.g.,  Hanson,  1977),  there  is  neither 
theoretical  motivation  nor  empirical  support  for  this  choice 
of  function.  Its  only  virtue  is  that  it  is  familiar, 
unimodal  and  specified  by  only  two  parameters. 

13  Again,  since  it  is  assumed  that  y=x,  the  response 
functions  can  be  calculated  as  if  they  were  spanning  the  x 
continuum  directly. 
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Fig.  2.12.  Two  Gaussian  detector  response  functions  spanning 
the  x-continuum. 


Fig.  2.13.  Outputs  u  and  u?  of  the  two  detectors  plotted  as 

a  probability  density  in  a  two-dimensional  decision 
plane,  u^  =  E(U  )  and  u^  =  EtU^)  where  and  are 
independent  random  variables 
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displayed  as  a  circular  normal  probability  density  function14 
in  a  u  -  u9  signal  space  (see  Fig.  2.13).  The 
presentation  of  a  given  stimulus  x  results  in  detector 
outputs  u1  and  u2  as  given  by  Eguations  2-20  and,  as  x  takes 
on  values  at  various  points  on  the  x  continuum,  the  points 
(u1,u2)  trace  out  a  line  in  this  space.  The  locus  of  the 
points  (u1,u9)  will  be  referred  to  as  the  "stimulus 
trajectory",  and  represents  a  mapping  of  the  x-continuum 
onto  this  space.  Fig.  2.14  shows  the  stimulus  trajectory 
for  various  values  of  x  (in  multiples  of  0.05)  for  the 
Gaussian  detector  functions  shown  in  Fig.  2.12.  The  points 
(u  (x)  ,u  (x)  )  are  the  centroids  of  a  circular  normal 
probability  density  function  of  variance  o  .  The  decision 
line  is  represented  by  the  dashed  line  at  45  degrees  which 
corresponds  to  u,  =  u2  .  The  probability  that  a  given 
stimulus  x  will  be  classified  as  belonging  to  category  2 
(i-e.,  u0  >  ux)  is  then 


p ( u 2 >  ul ) 


ST  a 


(2-21) 


where  (u  2-u  j)  //y  is  the  perpendicular  distance  from  point 


lA  This  assumes  that  the  noise  sources  for  the  two  detectors 
are  uncorrelated.  If  the  noise  sources  are  correlated,  the 
probability  density  function  is  still  bivariate  but  not 
circular.  This  assumption  is  made  for  mathematical 
convenience  but  in  practice  needs  to  be  demonstrated. 
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Fig.  2.14.  The  stimulus  trajectory  created  by  the  locus  of 
points  (u1 (x) ,u2 (x)) 
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(u  ru  )  to  the  decision  line-  The  identification  function 
for  the  stimulus  trajectory  shown  in  Fig.  2.14  will  be 
ogival  in  shape  since  the  decision  line  is  crossed  only 
once.  For  more  complicated  detector  response  functions,  as 
x  takes  on  successive  values  along  the  continuum,  depending 
on  the  nature  of  the  functions  hj  (x)  and  h7  ( x)  ,  the  stimulus 
trajectory  may  cross  the  decision  axis  only  once,  or  perhaps 
several  times.  Multimodal  response  functions  will  in 
general  lead  to  multimodal  ident if ication  functions. 

2.3.1  The  Dispersion  Function 

In  order  to  derive  an  expression  for  the  dispersion 
function  D(x)  ,  it  is  necessary  to  define  a  single  decision 
variable,  K,  from  the  two  detector  outputs.  The  most 
obvious  choice  is 

U2  -  Ux 

W  =  -  (2-22) 

y/2 

which  is  the  decision  variable  used  in  the  identification 
model.  From  the  previous  definition  of  the  one-dimensional 
dispersion  function  (Equation  2-13),  the  corresponding  two- 
dimensional  function  can  be  stated: 
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w 


AB 


D (r ) -dr 


(2-23) 


where  r  =  u  ( x)  i  +  u  2  (x )  j  and  A  and  B  are  two  arbitrary 
points  in  the  plane  connected  by  the  direct  arc  CA"g.  Now 
since  dr  =  du ^  i  +  du0  j,  and  du  ^  =  (du^/dxjdx  etc.. 
Equation  2-23  can  be  written  as 


(2-24) 


dx  i  +  u!,  dx  j) 


W  (  X  ) 


o 


where  u^-du^/dx  and  u2,=du2/dx.  In  order  for  this  integral 
to  be  path  independent  (which  is  equivalent  to  stating  that 
the  similarity  between  two  stimuli  x-^  and  x?  depend  only  on 
their  respective  positions  in  the  u1~u2  plane),  D(r)  must  be 
related  to  w  as  follows: 


D (r )  =  grad  w 


(2-25) 


The  above  integral  thus  becomes 


w  (x) 


(2-26) 


+ 


and  it  follows  that  the  dispersion  function  can  be  specified 


as  a  function  of  x  as: 
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D  (x )  = 


9  W 

-  ui 

9u  1 


9  w 

-  U9 

8u2  2 


No 


w,  assuming  w  =  (u^-Uj)  /^~ 


9  w 
9  u 


/2~~ 


and 


9  w 


3u2  /T 


so  the  dispersion  function  becomes 


D  (x )  = 


u2(x)  -  u:(x) 

7t 


(2-27) 


(2-2  8) 


(2-29) 


which  is  evidently  the  form  suggested  by  Elman  (1977). 15  For 
the  detector  functions  defined  by  Equations  2-20  this 
dispersion  function  can  be  calculated,  and  is  shown  as  the 
dashed  line  in  Fig.  2.15.  The  point  of  maximum  dispersion 
is  located  at  the  category  boundary  (u  2=u  j)  • 

This  is  not  the  only  possible  choice  of  decision 
variable  which  satisfies  relation  2-25.  If  w  is  defined  as 
the  angle  between  the  line  joining  the  origin  and  the  point 
(u1#u2)  and  the  decision  line,  i.e.  , 

u2  "  U1 

tan  w  =  -  (2-30) 


15  Elman  does  not  use  the  term  "dispersion",  but  uses  the 
phrase  ". . . discri minability,  as  measured  by  the  difference 
in  slopes"  (p.  5).  Evidently,  he  is  employing  much  the  same 
concept  but  no  mathematical  details  are  given. 


, 
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Fig.  2.15.  Various  dispersion  functions  for  two  Gaussian  detectors. 

( - ■)  Equation  2-29;  ( - )  Equation  2-31; 

( - )  Equation  2-34 
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the  dispersion  function  is  found  to  be 


D(x)  = 


(2-31) 


The  denominator  is  just  the  squared  length  of  the  vector  _r  = 
u  i  i  •*-  u  2  an<^  "thus  represents  a  form  of  "intensity" 
normalization-  The  numerator,  on  the  other  hand,  is  just 
the  Krcnskian  of  the  two  detector  response  functions  u  ^(x) 
and  u  2 ( x)  •  i-  e-  , 


W  (u  ,  u  ,  x)  =  -  u  | u ^ 


(2-32) 


The  two  functions  u^(x)  and  u0(x)  are  independent  only  where 


the  Kronskian  is  non-zero.  Thus, 


assuming  that  u  and  u 


do 


2 


not  go  to  zero  simultaneously,  it  follows  that  when  the 
Wronskian  is  non-zero,  the  dispersion  is  non-zero.  The 
solid  line  in  Fig.  2.15  shows  this  dispersion  for  two 
Gaussian  detectors. 

A  third  possible  measure  of  similarity  of  signals  in 
the  u  j-u  2  plane  is  Euclidean  distance,  measured  from  some 
arbitrary  but  fixed  reference  point  (uio’u20^  : 


2 


(2-33) 
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To  express  this  distance  strictly  as  a  function  of  x,  it  is 
necessary  to  assume  some  reference  point  (u  Q,  u9^.  It  is 
neither  clear  what  this  point  should  be,  nor  is  w 
independent  of  this  choice-  The  dispersion  function 


(2-34) 


defines  w  as  distance  measured  along  the  direct  arc  r(x) 
from  seme  arbitrary  reference  point,  but  this  integral  is 
not  path  independent.  This  dispersion  function  is  shown  as 
the  dashed-dotted  line  in  Fig.  2.15. 


There  is  no  obvious  choice  between  the  above  three 
possible  dispersion  functions-  All  show  a  peak  at  the 
category  boundary,  and  decrease  monotonically  away  from  the 
boundary,  at  least  in  the  immediate  vicinity  of  the 
boundary.  Equation  2-31  has  the  most  mathematically 
desirable  properties  and  is  unimodal.  This  function 
corresponds  to  an  angular  metric-  Equation  2-2S  has  the 
virtue  of  using  the  same  decision  variable  as  the  labelling 
function,  but  it  is  not  unimodal-  The  last  metric. 

Euclidean  distance,  was  the  choice  of  metric  of  Zinnes  and 
Wolfe  (1977)  in  their  formulation  of  a  model  of  same- 
different  discrimination  for  a  two-dimensional  visual  task- 
However,  the  w-x  mapping  given  by  Equation  2-33  is  dependent 
on  the  particular  path  r(x)  =  Uj(x)  i  ♦  u2(x)  j,  and  hence 
is  a  more  a  property  of  the  detector  functions  than  it  is  of 


. 
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the  u1~u2  space,  for  this  reason,  it  will  not  be  considered 
farther. 

2.3.2  Calculating  the  AX  Discrimination  function 

Given  the  dispersion  function  D(x)  defined  by  Equation 
2-30,  the  AX  discrimination  function  can  be  calculated  from 
Equations  2-11  and  2-13.  For  the  case  of  two  Gaussian 
detectors,  the  corresponding  AX  discrimination  functions  are 
shown  in  Fig.  2.16.  It  can  be  seen  that  this  AX 
discri nination  function  is  virtually  identical  to  that  shown 
in  Fig.  2.9,  and  thus  has  similar  properties. 

In  the  spirit  of  the  concept  of  "detector1',  the 
variables  u2  and  u2  used  in  the  above  analyses  presumably 
represent  some  form  of  neural  excitation  associated  with  the 
detection  of  signals  specified  by  the  x  continuum.  The 
closer  the  stimulus  is  to  the  detector  maximum,  the  greater 
the  degree  of  excitation.  If  this  is  the  case,  the  output 
of  the  detector  (for  a  phonetic  continuum)  is  the  "phonetic 
value"  of  the  stimulus.  The  recognition  of  a  particular 
phonetic  category  is  then  dependent  on  the  relative 
strengths  of  excitation  of  two  neural  populations  which 
correspond  to  the  two  detectors. 

Discrimination,  then,  presents  an  interesting  problem. 
If  perception  is  mediated  by  detectors  in  the  form  described 
above,  then  in  order  for  discrimination  to  occur  at  a  sub- 
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Fig.  2.16. 


Calculated  AX  discrimination  functions  for  two  Gaussian 

detector  response  functions  positioned  at  x  =0.3  and 

x  =  0.7  (i.e.,  Fig.  3.12).  The  dispersion  function 

is  given  by  Equation  2-31.  (a)  Ay^  =  0.5,  (b)  Ay^  =  0.05, 

where  Ay  is  the  criterion  for  the  same/different 
J  c 

decision 
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phonetic  level,  the  outputs  of  both  detectors  roust  leave 

traces-  That  is,  the  values  of  both  u  and  u  must  be 

1  2 

"remembered".  For  phonetic  level  discrimination,  on  the 
other  hand,  only  which  detector  was  excited  the  most 
strongly  need  be  remembered-  An  alternative  possibility  is 
that  the  auditory  representation  of  the  stimulus  (i.e.,  the 
value  of  x)  is  remembered-  In  general,  if  the  mapping  y  = 
g(x)  shows  no  enhanced  dispersion  at  any  point  along  the 
continuum,  then  discrimination  will  not  be  enhanced,  as 
shown  in  Fig.  2-7.  If  y  does  show  enhanced  dispersion,  then 
a  "natural  perceptual  boundary"  may  exist  along  the  x 
continuum.  It  is  conceivable  that  speech  categories  would 
be  structured  around  any  such  extant  natural  perceptual 
boundaries  rather  than  the  converse- 

2.4  APPLICATION  TO  SELECTIVE  ADAPTATION 


The  model  of  categorical  perception  as  mediated  by  a 
two-detector  configuration  will  now  be  investigated  with 
respect  to  selective  adaptation.  Since  the  effect  of 
adaptation,  as  commonly  supposed,  is  to  desensitize  one  of 
the  detectors,  this  can  be  modelled  by  incorporating  scaling 
factors  a  ^  and  a 2  into  the  detector  response  functions  H  1 
and  H  defined  by  Eguations  2-20.  That  is. 


( x  -  x  l  Q ) 
2a2 
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e 
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(2  -35  a) 
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u2(x) 
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(2-35b) 


The  boundary  stimulus  is  characterized  by  the  value  of  x  for 
which  u^x)  =  u2  (x)  ,  in  which  case 


(x-xtq) 


a  e  2 a2 

1  D 


a26 


(x-x^0)2 

H 


(2-36) 


The  solution  for  x  is 


x 


io 


+  x 


20 


2 


(2-37) 


Now,  since  the  unadapted  boundary  is  just  xb  =  (x10+xOQ)/2} 
Equation  2-37  becomes 


x 

s 


(2-38) 


where  xg  =  x  -  is  the  boundary  shift.  This  has  three 
implications:  first,  desensitization  of  one  of  the  detectors 
will  cause  a  boundary  shift,  and  second,  desensitization  of 
both  detectors  simultaneously  and  by  the  same  fraction 
(i.e.,  such  that  a2/a  ^  does  not  change)  will  cause  no  change 
in  the  boundary-  The  first  result  apparently  has  been 
verified  many  times  in  the  selective  adaptation  literature, 
and  the  second  result  has  also  been  demonstrated  (Miller, 
1977;  Sawusch  and  Pisoni,  1976).  The  third  implication  is 
that,  ceteris  paribus,  larger  boundary  shifts  will  occur  for 
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less  strongly  categorized  continua  (i.e.,  with  larger  aD)« 


2-h.l  Discrimination  Under  Conditions  of  Adaptation 

Cooper  (1974)  investigated  tie  shift  in  the  peaks  of 
the  AEX  discrimination  curve  for  /ba/,  /da/  and  /ga/ 
adaptors  on  an  i2”F3  continuu  ®.  his  results  show  that  the 
shift  in  the  peak  of  the  discrimination  function  is  in  the 
same  direction  and  of  approximately  the  same  magnitude  as 
the  shift  in  the  boundary  of  the  corresponding 
identification  function.  This  suggests  that  there  is  an 
intimate  relationship  between  the  location  of  the 
identification  boundary  and  the  peak  of  the  discrimination 
function  (or  in  the  light  of  the  previous  discussions,  the 
peak  of  the  dispersion  function). 


Consider  the  dispersion  function  defined  by  Eguation  2- 
31.  The  generalization  of  this  function  for  tne  Gaussian 
detector  functions  given  by  Eguation  2-35  is 


D(x) 


(x20  "  X10')  U1U2 
°D  (U12  +  U22) 


(2-39) 


Now,  the  peak  of  the  dispersion  function  occurs  for  x  such 
that  dE/dx  =  0.  Differentiating  Eguation  2-39  and  setting 
it  to  zero  produces 
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(u  -u  )  (u  +  u  )D(x)  =  0  (2-40) 

12  12 

provided  that  u^  and  u?  do  not  go  to  zero  simultaneously. 
Since  the  Dispersion  function  D  (x)  (for  Gaussian  detectors) 
is  non-zero  for  finite  values  of  x,  it  fellows  from  Equation 
2-40  that 


ul  =  u2  (2-41) 

Thus,  this  detector  model  of  categorical  perception  has  the 
property  that  the  peak  of  the  dispersion  function  (and  hence 
discrimination  function)  will  always  coincide  with  the 
phonetic  boundary  (for  detectors  with  Gaussian  response 
functions)  . 

2.4.2  The  Effect  of  Adaptation  on  the  Stimulus  Trajectory 

The  effect  of  adaptation  on  the  stimulus  trajectory  is 
shown  in  Eig.  2.17.  Desensitizing  one  detector  is 
equivalent  to  scaling  the  corresponding  axis  of  the  decision 
plane  by  the  same  factor.  This  causes  a  distortion  of  the 
stimulus  trajectory  such  that  the  point  u^ -u^  now 
corresponds  to  a  different  value  cf  x.  Two  results 
automatically  follow  from  this.  One,  a  desensitization  of 
one  or  both  detectors  is  equivalent  to  a  change  in  response 
bias.  Eor  a  desensitized  detector  configuration,  the 
unbiased  decision  line  is  u^  =  (a2/aj)u2,  which  is 
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Fig.  2.17.  Effect  of  adaptation  on  a  two-detector  system  to 

a  stimulus  from  category  1  (i.e.,  only  u^  is  affected). 
The  arrows  connect  identical  values  of  x.  Note  that 
the  category  boundary  (the  point  where  the  stimulus 
trajectory  crosses  the  decision  line)  shifts  towards 
the  category  of  the  adaptor 
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equivalent  to  a  decision  line  at  an  angle  <J>  where  tan  <P  = 
a  }/a 2*  However,  a  change  in  bias  also  corresponds  to  a 
change  in  the  angle  of  the  decision  line,  e.  g.,  u^bUj, 
where  b= 1  corresponds  to  the  45  degree  decision  line.  Under 
conditions  of  adaptation,  the  category  boundary  is  then 
defined  by  u 2=  (ba 1/a 2) u1#  which  shows  that,  according  to 
this  model,  response  bias  and  adaptation  are  formally 
inseparable.  According  to  the  present  model,  there  is  no 
way  to  distinguish  between  the  detector  desensitization  and 
response  bias  accounts  of  phonetic  boundary  shift  by  simply 
measuring  the  boundary  shift. 

Ihis  model  of  selective  adaptation  suggests  yet  another 
effect-  Assuming  that  the  output  of  the  detectors  increases 
monotonically  with  stimulus  intensity,  e.g., 

u1  =  h 1  ( I ) u1  (x)  (2-42a) 


u 2  =  h  2  (I)u2  (x) 


(2- 42b) 


radial  distance  in  the  u^-^  plane  ought  to  represent  an 
intensive  aspect  of  the  stimuli.  If  the  detector  response 
curves  merely  reflect  the  sensitivity  of  the  detector  to 
certain  stimuli,  then  decreasing  the  intensity  of  the 
stimulus  should  also  produce  a  translation  of  the  stimulus 
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trajectory  in  the  direction  of  the  origin.  Thus,  in  this 
model,  decreasing  the  sensitivity  of  a  detector  is 
equivalent  to  decreasing  the  intensity  of  the  signal  to 
which  the  detector  responds.  Miller,  Eimas  and  Root  (1977) 
conducted  a  selective  adaptation  experiment  using  /hae/, 
/dae/  and  /gae/  stimuli  each  of  which  was  constructed  with 
nine  levels  of  attentuation  of  E9  and  F„  with  respect  to 
F1  .  Their  results  show  that  after  adaptation  to,  say, 

/ba e/,  to  obtain  a  level  of  identification  equal  to  the  pre¬ 
adaptation  condition,  the  /bae/  must  be  more  intense. 

Similar  results  were  found  with  the  /gae/  adaptor.  This  is 
the  only  experiment  to  date  which  has  attempted  to  see  if 
the  desensitization  of  a  particular  detector  ("channel  of 
analysis"  in  their  terminology)  can  be  restored  by  an 
equivalent  increase  in  intensity,  and  their  results  tend  to 
support  the  predictions  of  the  above  model. 

2.4.3  A  New  Experimental  Paradigm 

The  detector  model  presented  above  suggests  a  new 
experimental  paradigm.  Consider  a  signal  which 
simultaneously  contains  the  acoustic  cues  corresponding  to 
two  positions  x-^  and  along  the  x-con tinuum. 1 6  Such  a 
signal,  according  to  this  detector  model,  will  generate  u^ 

i6  in  chapter  3  it  will  be  shown  how  to  construct  such  a 
signal . 
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and  a,  as 


u1  =  u1(I,x)  =  h1(I)u1(x)  (2-43a) 


and 


u  2  =  u2(I,x)  =  h2(I)u2(x)  (2-43b) 

Holding  and  x2  fixed  but  varying  the  relative 
intensities  1^  and  I2  of  these  twc  cues  gives 

U1  =  (2-44a) 


and 


u  2  =  h2(I2)u2(x2) 


(2  -44  b) 


from  which  it  is  seen  that  the  outputs  of  the  detectors,  u^ 
and  u2  /  can  be  manipulated  by  altering  intensities  I  and 
I2 •  If,  for  instance,  I1  and  I9  are  related  by 


I 


1 


(2-45) 


then  if  h(I)  is  monotonic  with  I  and  h(0)=0,  it  follows  that 
there  exists  a  value  of  I  for  which  u  =u  .  That  is, 
relative  intensity  I  will  define  a  continuum  between  the  two 
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categories,  with  1=0  representing  one  category  and  I=I0 
representing  the  other,  A  category  boundary  will  exist  for 
the  value  of  I  for  which  u1=u2-  Thus,  as  I  is  varied  from  0 
to  I0 ,  again  u^  and  u2  will  trace  out  a  trajectory  as  shown 
in  Fig.  2.18.  Since  this  stimulus  trajectory  is  similar  to 
that  shown  in  Fig.  2.14,  it  would  appear  that,  if  these 
assumptions  are  correct,  a  "relative  intensity  continuum" 
will  be  formed.  As  will  be  shown  in  Chapter  3,  such  a 
continuum,  for  some  combinations  of  speech  sounds,  is  also 
categorically  perceived. 

2.  5  SUMMARY 

In  summary,  the  analysis  of  the  two-detector  model 
given  above  shows  that  a  two-detector  configuration  produces 
a  dispersion  of  the  decision  variable  Y  =  Uj -U2  (i.e.,  the 

detector  outputs)  with  respect  to  the  physical  continuum 
"x".  The  shape  of  the  dispersion  function  is  strictly 
determined  by  the  detector  response  functions.  Two 
detectors  which  have  opposite  slopes  at  the  phonetic 
boundary  will  always  lead  to  enhanced  dispersion  in  the 
vicinity  of  the  category  boundary  (which  is  the  point  where 
the  detector  outputs  are  egual)  .  The  degree  of  dispersion 
(and  hence  discrimination)  is  a  strong  function  of  the 
slopes  of  the  detector  response  functions  near  the  boundary: 
the  steeper  the  slope,  the  larger  the  dispersion.  In 
discrimination,  the  subjective  task  is  one  of  perceiving 
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(a)  Hypothetical  stimulus  trajectory  for  Uj =h1 (I 1 )ul (Xj ) 
and  u2=h2 (IQ-I2)u2 (x2)  for  fixed  x1  and  x2. 

(b)  correspdonding  identification  curve  as  Ij  is  varied 
from  0  to  I0 


Fig.  2.18. 
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differences  in  sensation.  These  differences  may  be  small  or 
large,  depending  on  the  physical  difference  between  the 
signals,  and  the  extent  to  which  the  auditory  system  is 
sensitive  to  these  differences.  It  may  well  be  the  case 
that  phonetic  differences  are  merely  "large  acoustic 
dif ferences".  This  does  not  imply  that  a  phonetic  level  of 
processing  per  se  does  not  exist,  nor  does  it  belittle  the 
role  of  phonetic  memory  in  various  experimental  paradigms. 
The  ABX  or  oddity  paradigms,  for  instance,  without  doubt 
place  stringent  demands  on  phonetic  memory  (Miller  and 
Eimas,  1977;  MacMillan  et  al.,  1977;  Pisoni  and  Lazarus, 
197^;  Pisoni,  1973;  Fujisaki  and  Kawashima,  1970)  and  the 
phonetic  memory  model  is  likely  appropriate  for  these 
conditions.  The  AX  paradigm,  on  the  other  hand,  places 
considerably  less  load  on  phonetic  memory,  and  a  model  based 
on  auditory  rather  than  phonetic  differences  may  be 
appropriate  to  account  for  observed  discrimination  data. 


Although  few  specific  suggestions  have  been  made  for 
the  embodiment  of  feature  detectors  in  neurophysiological 
terms  (Abbs  and  Sussman,  1971,  suggest  the  term  "neuro- 
sensory  receptive  field"),  it  follows  that  in  order  for  the 
concept  to  have  substance,  a  specific  formulation  must  be 
given.  Simon  and  Studdert-Kennedy  (1978)  caution  against  a 
literal  interpretation  of  the  detector  metaphor,  and  suggest 
the  use  of  the  term  "channel  of  analysis"  as  being  more 
neutral.  This  is  sound  advice,  but  the  issue  is  more  than 
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one  of  just  terminology.  The  guestion  is  not  whether  or  not 
there  exist  special  detectors  specially  tuned  to  selected 
attributes  of  speech  signals,  but  whether  cr  not  the  nervous 
system  can  behave  as  if  this  were  the  case.  With  a  fixed 
set  of  stimuli  drawn  from  a  continuum,  the  "detector" 
metaphor  may  be  a  perfectly  suitable  characterization  -  The 
analysis  conducted  in  this  chapter  is  a  mathematical 
characterization  of  the  detector  metaphor,  and  does  not  make 
any  specific  claims  regarding  the  physical  existence  or 
physiological  makeup  of  such  detectors,  were  they  to  exist. 
Bather,  it  represents  an  attempt  to  guantify  proposals  which 
have  appeared  from  time  to  time  in  the  literature,  and  the 
appropriateness  of  the  metaphor  may  hopefully  be  clarified 
by  so  doing. 


CHAPTER  3 


CATEGORICAL  PERCEPTION  OF  A  RELATIVE  INTENSITY  CONTINUUM 


In  this  chapter,  a  new  technigue  for  investigating 
categorical  perception  is  presented.  The  continuum  in  this 
case  is  the  relative  intensity  of  two  CV  stimuli  whose  time 
waveforms  are  added  together.  When  the  two  CV  (or  VC) 
stimuli  contain  phonemes  which  normally  occur  in  the  same 
position  in  the  syllable  (i.e.,  initial  or  final)  and  which 
lie  on  an  acoustic  continuum,  the  percepts  fuse.  When  the 
relative  intensities  of  the  two  components  (C^  and  )  are 
varied,  a  "continuum"  is  created  which  is  categorically 
perceived.  For  syllable-initial  step  consonants  (e.  g. ,  /b/ 
and  /d/) ,  the  relative  intensity  continuum  has  all  the 
properties  of  an  f2-f3  but  has  t^ie  distinct 


73 


74 


advantage  of  being  specified  by  a  single  physical 
parameter.  In  this  chapter,  various  experiments  are 
conducted  to  investigate  the  origin  and  generality  of  the 
effect.  ABX  and  AX  discrimination  results  show  that 
disciminability  is  poor  except  when  components  and  C2  are 
nearly  equal  in  intensity-  The  dispersion  model  of  AX 
discrimination  (Section  2.2)  is  fitted  to  the  experimental 
data,  and  the  fit  is  observed  to  be  guite  satisfactory.  On 
the  basis  of  the  observed  results,  the  phonetic-memory  model 
of  categorical  perception  can  be  rejected. 

3. 1  CREATION  OP  THE  TEST  STIMU1I 

—  —  — — —  — — .  — — — —  —  — 

Pig.  3.1  shows  schematically  how  an  ambiguous  CV 
(/bae/-/dae/)  signal  is  created.1  Two  /bae/  and  /dae/  signal 
waveforms  with  similar  f 0  * s  are  added  together,  and  the 
result  is  a  waveform  which  contains  the  phonetic  cues  for 
both  /bae/  and  /dae/.  Such  a  signal  is  found  to  be 
perceptually  ambiguous,  and  is  perceived  as  either  /bae/  or 
/dae/.  The  quality  of  the  resulting  stimulus  depends  on  how 
well  the  two  component  stimuli  are  aligned,  and  mutual 
interference  can  be  minimized  if  the  two  waveforms  are 


1  Most  of  the  experiments  described  in  this  thesis  employed 
this  particular  pair  of  stimuli-  Por  this  reason,  the 
stimulus  construction  procedure  will  be  described  in  terms 
of  these  stimuli.  However,  the  procedure  (and  the 
perceptual  effect)  is  guite  general,  and  can  be  applied  to 
various  other  signal  waveforms,  including  syllable-final 
stop  consonants. 


FREQUENCY 
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TIME 


Fig.  3.1.  Schematic  formant  transitions  for  a  composite 
/bae/-/dae/  stimulus 


12  3  4  5  6 


Fig.  3.2.  Example  of  covariance  trace  for  pitch  periods  one 

through  six  for  /bae/  and  /dae/ .  The  arrows  indicate 
the  points  of  maximum  correlation  between  corresponding 
/b/  and  /d/  pitch  periods 


# 
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suitably  aligned. 

The  procedure  for  "suitably"  aligning  the  stimuli  is 
best  explained  by  means  of  an  example.  Multiple  tokens  of 
/bae/  and  /dae/  were  recorded  by  the  author  with 
approximately  egual  fundamental  frequencies  (f0  =  100  Hz) 
and  steady-state  vowel  formant  values.  The  recording  was 
carried  out  in  an  acoustically  isolated  chamber  using  a  TEAC 
AH-70  tape  recorder.  A  /tae/-/dae/  pair  was  then  selected 
on  the  basis  of  judgements  of  the  similarity  of  the  f0*s  and 
the  compatibility  of  the  formant  values  of  steady  state 
vowels,  as  determined  from  Sonagrams  of  the  stimuli  and 
plots  of  the  signal  waveforms.  These  tokens  of  /bae/  and 
/dae/  were  digitized  at  16  kHz  and  stored  in  a  disk  file  for 
later  processing.2  The  first  nine  pitch  periods  of  these  two 
waveforms  were  extracted  and  stored  separately.3  The 
waveform  with  the  smallest  fG  (/ b/  in  this  example)  was 
selected  as  the  "standard",  to  which  the  other  signal  (/d/) 
was  then  temporally  aligned.  The  alignment  procedure  was  as 
follows:  separate  signals  /b/  and  /<V j  ( j=1 ,2, .  . . ,  9)  were 
created  from  each  of  the  pitch  periods  of  the  digitized  /b/ 


2  All  signal  preparation  and  presentation  was  carried  out 
using  an  programming  system  for  the  PDF- 12  designed  by  the 
author  (see  Stevenson  and  Stephens,  1978) . 

3  The  following  notation  will  be  used:  /bae/  will  refer  to 
the  entire  CV  syllable,  while  /b/  will  refer  only  to  the 
extracted  formant  transitions  from  that  syllable.  /ae/  will 
represent  the  steady-state  vowel,  which  is  the  tenth  through 
last  glottal  pulse  of  the  CV  waveform. 
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and  /d/  waveforms  -  Each  /d/  was  then  aligned  with  its 
corresponding  /b/.  by  calculating 


(3-1) 


where  s^  and  sd  are  the  waveform  amplitudes  of  /b/  and 
/d/  respectively.  The  two  signals  were  considered  aligned 
for  the  offset  which  yielded  a  maximum  .  (Fig.  3.2  shows 
the  plots  of  r j  for  the  first  six  pitch  periods) .  After 

each  of  the  /d/.  was  treated  in  this  fashion,  a  new  set  of 

3 

/d/  formant  transitions  was  created  by  concatenating  these 
adjusted  /d/  pulses.  The  resulting  /d/  waveform  now 
matched  the  /b/  waveform  on  a  pulse  for  pulse  basis  (see 
Fig.  3.3).  To  complete  the  stimulus  preparation,  the  /d/ 
was  then  scaled  so  that  its  overall  intensity  was  equal  to 
that  for  the  /b/,  where  the  intensity  was  measured  by 


N 


i  =  1 


The  summation  is  taken  over  the 
formant  transitions  (total  of  N 

These  /b/  and  /d/  signals, 
state  vowel  /ae/  (which  was  the 
period  of  /bae/)  were  stored  in 


(3-2) 

nine  pitch  periods  of  the 
points)  . 

together  with  the  steady- 
tenth  through  last  pitch 
a  file,  and  were  used  as  the 
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Fig.  3.3.  Aligned  /b/  and  / d/  formant  transitions 
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basis  signals  for  construction  of  all  test  stimuli. 
Presentation  stimuli  were  prepared  from  these  three  signals 
by  first  loading  them  into  the  computer  memory  and  then 
adding  the  /b/  and  /d/  formant  transitions  together  point  by 
point  according  to 

s  =  as  +  (l-a)s  (3-3) 

1  bi  di 

Finally,  the  resultant  signal,  s',  was  concatenated  to  the 
steady-state  vowel  /ae/.  The  linear  weighting  of  the  /b/ 
and  /d/,  along  with  the  convergence  of  both  of  these 
waveforms  to  the  same  steady-state  vowel  ensured  continuity 
of  the  amplitudes  of  the  mixed  formant  transitions  and 
steady-state  vowel. 

Fig.  3.4a  shows  the  relative  amplitudes  of  the  /b/  and 
/d/  waveforms  as  a  function  of  the  weighting  parameter  a  . 
Because  the  amplitudes  of  the  signal  waveforms  were  varied 
linearly,  the  intensity  of  the  waveforms  varied 
guadratically.  Since  both  signals  were  of  the  same  duration 
(T  a,  90  milliseconds)  ,  their  intensities  can  be  represented 
by  the  total  energies 


(3- 4a) 


' 


INTENSITY  AMPLITUDE 
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Fig.  3.4.  (a)  variation  of  amplitude  of  /b/  and  / d/  waveforms 
with  parameter  a.  (b)  variation  of  E^,  and  E  ^ 
with  a.  The  solid  line  is  given  by  Equation  3-6° 
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and 


N 


2 


(3-4b) 


i  =  l 


i 


where  sb(t)  and  sd(t)  are  the  time  waveforms  of  the  /b/  and 
/d/  formant  transitions.  The  intensity  of  the  composite 
signal  can  thus  be  expressed  as 


(3-5) 


where  p  is  the  Pearson  product-moment  correlation 
coefficient.  fig.  3.4b  shows  the  measured  e,  ,as  a  function 


of  the  weighting  parameter  a  .  As  expected,  the  curve  is 
quadratic  in  shape,  and  has  a  minimum  at  a  =  0.  5.  Since 
the  /t/  and  /d/  formant  transitions  were  equated  for  overall 
intensity,  the  E,  ,  curve  reaches  the  same  maximum  value  at 
the  extreme  positions  a  =  0  and  a  =  1.  The  solid  line 
through  the  data  points  in  Fig.  3-4b  is  a  quadratic  curve 
fitted  by  least  squares: 


(3-6) 


1.00  -  1.36  a  ♦  1  .36  a2 


E 


This  equation  was  used  to  eliminate  differences  in  overall 
intensity  in  the  discrimination  and  binaural  experiments 

described  later  in  this  chapter.  Letting  E  =  E  =  1, 

b  d 

Equation  3-5  becomes 
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E  =  1  -  2  (l-p)a  +  2  (1  -p)  a2  (3-7) 

whence  P  =  0.32. 

3.2  IDENTIFICATION  CURVES 

Informal  experiments  indicated  that  a  sharp  transition 
between  phonetic  categories  existed  where  the  intensities  of 
the  two  CV  components  were  approximately  egual  (i. e. ,  a  = 
0.5).  A  series  of  experiments  was  carried  out  to  determine 
the  nature  of  the  identification  function  for  the  continuum 
determined  by  the  parameter  a  .  All  stimuli  were 
constructed  according  to  the  procedure  outlined  in  Section  3 
-1  above,  and  were  presented  on-line  to  one  or  more  subjects 
in  a  quiet  listening  environment  as  detailed  below.  The 
stimulus  pairs  were  /bae/-/dae/,  /bet/-/det/  and  /ra/-/la/. 
Sonagrams  of  some  of  the  /bae/-/dae/  combinations  are  shown 
in  Fig.  3.  5. 

3.2.1  Experimental  Setup 

The  physical  arrangement  of  the  computerized 
presentation  facility  is  shown  in  Fig.  3.6.  The  digitized 
stimuli  were  converted  into  analogue  voltages  at  a  sampling 
rate  of  16  kHz  by  10-bit  D/A  converters  which  were  part  of 
the  PDP-12  configuration.  The  stimuli  were  delivered  to  the 
remote  listening  station  over  lines  with  a  measured  55-60  dB 
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Fig.  3.5.  Sonagrams  of  /bae  /  -  / dae/  stimuli.  From 
left  to  right  the  values  of  a  are: 
a=0  (extreme  left,  corresponding  to  a 
pure  /dae/),  a=0.25,  a=0.5,  a=0.75,  a=1.0 
(extreme  right,  corresponding  to  a  pure  /bae/). 
The  line  drawing  below  illustrates 
the  formant  composition 
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DISK  FILE 


Fig.  3.6.  Computerized  presentation  facility.  The  headphones  and 
response  boxes  are  in  a  quiet  listening  environment 
isolated  from  the  computer 
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signal^to-noise  ratio  and  then  filtered  by  a  Rockland  Series 
1520  filter  (Eutterworth )  set  to  lower  and  upper  cut-off 
frequencies  of  70  and  7000  Hz  respectively.  The  output  of 
the  filter  was  amplified  by  a  Braun  amplifier  (Type  CSV  250) 
which  was  linear  over  a  50  dB  dynamic  range  (see  Fig.  3.  7) , 
and  then  fed  into  a  bus  which  serviced  Telephonies  TDH-49 
headphones.  The  frequency  response  of  the  matched 
filter/amplifier/earphone  combination  to  a  swept  sinusoidal 
voltage  of  80  dB  re  0.  0002  dynes/cm2  is  shewn  in  Fig.  3.8. 

The  listening  level  was  set  at  80  dB  SPL  for  a  1000  Hz 
sine  wave  with  an  RMS  voltage  equal  to  the  RMS  voltage  for 
the  steady-state  vowel  of  the  /bae/-/dae/  pair.  The 
absolute  intensity  setting  was  determined  with  the  aid  of  a 
Bruel  &  Kjaer  artificial  ear  (Type  4153)  calibrated  with  a 
Eruel  &  Kjaer  Pistonphone  (Type  4230)  .  The  intensity 
calibration  was  checked  prior  to  each  session,  and  varied 
less  than  ±0.3  dB  on  a  day-to-day  basis.  For  monaural 
presentation,  crosstalk  was  eliminated  at  the  headphones  by 
disconnecting  the  input  to  the  opposite  earphone. 

The  procedure  for  generating  the  stimuli  for  on-line 
presentation  was  as  follows:  the  formant  transitions  of  the 
signals  being  combined  (e.  g. ,  /b/-/d/,  /r/-/l/  etc.)  were 
loaded  into  core  and  scaled  by  factors  of  a  and  1- ct  which 
were  read  in  from  a  file  of  randomized  numbers.  There  were 
21  stimuli,  representing  21  equal  steps  of  Aa  =  0.05  from 


EARPHONE  OUTPUT  (dB  SPL) 
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Fig.  3.7.  Response  of  the  two  Braun  amplifiers  to  a  1000  Hz  sinusoid. 

The  cross  indicates  the  operating  point  for  all  experiments 
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FREQUENCY  (Hz) 


Fig.  3.8.  Frequency  response  of  the  left  (dashed  line)  and  right 

(solid  line)  Rockland  filter,  Braun  amplifier  and  TDH-49 
headphone  combinations 
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0  to  1 .  The  scaled  formant  transitions  were  then  added 
together  point-by-point  according  to  Eguation  3-3  above. 

The  steady^state  vowel  was  then  loaded  into  core  from  the 
disk  file,  and  concatenated  to  the  composite  formant 
transitions,  and  the  entire  signal  was  then  played  back. 

The  time  required  for  the  loading,  scaling,  addition  and 
concatenation  was  approximately  1.5  seconds.  After  the 
stimulus  was  played  back,  switches  at  the  remote  listening 
stations  were  monitored  for  the  subjects*  responses.  The 
program  waited  until  all  subjects  had  pressed  one  of  their 
switches  before  proceeding  with  the  next  presentation.  The 
in  ter  stimulus  interval  was  thus  somewhat  variable,  but 
averaged  around  four  seconds.  The  identity  of  the  stimulus 
presented  and  the  subjects*  switch  choices  were  recorded  in 
a  disk  file  for  later  processing.  After  a  run,  the  number 
of  /b/  responses  were  automatically  tabulated,  and  the 
identification  curve  was  displayed  on  a  storage 
oscilloscope. 

The  stimuli  were  blocked  into  groups  of  25  in  order  to 
break  up  the  randomization  which  cycled  every  21  stimuli 
(i.e.,  all  21  stimuli  were  played  back  before  any  stimulus 
was  repeated).  A  one-second  1000  Hz  tone  was  played  back  at 
the  end  of  each  block  of  25  stimuli,  followed  by  five 
seconds  of  silence.  (This  was  necessary  to  distinguish 
between  an  interblock  silence  and  a  subject's  switch  failing 
to  record).  Each  value  of  a  was  presented  10  times,  for  a 
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total  cf  21x10  =  210  stimuli.4 

3.2.2  EXPERIMENT  1:  Identification  Scores 

The  first  set  of  experiments  involved  identifications 
of  the  /bae/-/dae/  stimulus  pair.  Five  subjects 
participated  in  this  and  other  studies  which  involved 
periodic  testing  over  a  period  of  approximately  eight 
months.  The  subjects  were  faculty  members  and  graduate 
students  of  the  Department  of  Linguistics.  The  author 
participated  as  a  subject  in  all  tests.  None  of  the 
subjects  reported  any  hearing  def iciencies. 5 

Preliminary  identification  runs  indicated  that  each 
subject  perceived  each  composite  stimulus  as  either  /bae/  or 
/dae/  with  no  phonetic  intrusions.  The  a  continuum6  was 
strongly  categorized7  into  a  region  0  <  a  <  a 
corresponding  to  /dae/r  and  a  region  a5Q  <  a  <  1 
corresponding  to  /bae/,  where  a5Q  was  the  value  of  a  at 

4  For  purposes  of  comparison,  the  same  21  stimuli  were  used 
in  all  /bae/-/dae/  experiments  described  in  this  thesis. 

5  Audicmetric  records  (Appendix  A)  show  that  some  of  the 
subjects  had  substantial  hearing  losses  at  frequencies 
greater  than  4000  Hz. 

6  Changing  a  from  0  to  1  causes  a  change  in  phonetic 
percept,  and  thus  it  is  meaningful  to  speak  of  a  as 
defining  a  "continuum”. 

7  It  will  be  shown  later  in  this  chapter  that  this  continuum 
is  categorically  perceived. 
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which  50  percent  recognition  occurred.  The  value  of  a50 
was  different  for  each  subject  and,  although  the  stimuli  had 
been  eguated  for  overall  intensity,  did  not  occur  at  a  =0.5 
for  any  subject.  (Possible  reasons  for  the  subject 
difference  are  discussed  below) .  All  subjects  found  the 
test  to  be  trivial.  The  endpoint  stimuli,  being  naturally- 
spoken  tokens  of  /bae/  and  /dae/,  always  resulted  in  100 
percent  identification  of  these  stimuli.  Only  in  five 
percent  or  fewer  of  the  stimulus  presentations  did  subjects 
experience  any  difficulty  in  deciding.  Each  stimulus  was 
generally  perceived  as  a  clear  instance  of  a  /bae/  or  /dae/, 
and  little  interference  from  the  other  component  was 
evident.  (Stimuli  in  the  transition  region  showed  a  slight 
increase  in  noisiness,  but  even  sc  were  easily  classified  as 
either  /tae/  or  /dae/)  . 

To  test  the  stability  of  the  subjects  boundaries 
(  ct  )  and  also  to  test  for  ear  differences,  the  following 
experiment  was  conducted.  Each  of  five  subjects  was  tested 
a  total  of  three  times  for  monaural  left  and  monaural  right 
conditions.  The  earphones  were  calibrated  as  described  in 
Section  3.3.1  above  so  that  the  left  and  right  earphones 
were  as  nearly  matched  as  possible  (see  Fig-  3.8)  . 

Typical  identification  runs  for  the  five  subjects  are 
shown  in  Fig.  3.9.  Fig.  3.9a  shows  the  relative  frequency 
of  /bae/  judgements  as  a  function  of  a  , 


and  Fig.  3.9b  shows 
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I  (dB) 

Fig.  3.9.  (a)  /bae/-/dae/  identification  curves  as  a  function  of  a. 

(b)  The  same  identification  curves  as  a  function  of  I  , 
the  relative  intensity  of  the  /bae/  to  /dae/  component 
expressed  in  decibels 
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the  saire  data  plotted  as  a  function  of  la  ,  the  intensity  of 
the  /b/  component  relative  to  the  /d/  component,  in 

decibels.  I  is  defined  by 

a 


I 

a 


2  0 1  o  g  n 
6  1  0 


a 

1  -a 


(3-8) 


The  identification  data  for  the  left  and  right  ear  were  then 
fitted  by  a  least  squares  technique  to  the  normal  ogive 


P  ( I  ) 


I  -I 

a 


5  0  'l 


a 


(3-9) 


The  fitting  process  characterized  each  identification  curve 
by  I5Q  ,  the  value  for  which  50  percent  recognition  occured, 
and  a  ,  the  standard  deviation.  The  average  boundaries  for 
the  left  and  right  conditions  are  shown  below  in  Table  3-1. 

A  two-way  ANOVA  (SUBJECT  x  EAR)  was  carried  out  on  both  the 
boundaries  (I50)  and  widths  of  the  transition  regions  (a  ). 
Factor  SUBJECT  was  significant  for  the  boundaries  (p<0.001), 
and  accounted  for  95  percent  of  the  variance.  EAR  was  not 
significant  for  the  boundaries.  Neither  SUBJECT  nor  EAR 
were  significant  for  the  widths  of  the  transition  regions. 
There  were  no  significant  SUBJECT  by  EAR  interactions. 
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SUBJECT 


GE 


JH 


ES 


GM 


PA 


/bae/-/dae/ 

EIGHT  EAE 
S  .  D  •  q 

(dB)  (dB) 

-0.69  0.52  1. 18 

1.  18  0. 61  1.26 

1.92  0.87  1.36 

-5.00  KOI  1.69 

-2.  28  2.  74  1.  75 


(dB) 


TABLE  3-1 


CATEGOEY  BOUNDAEIES 


LEFT  EAE 


S.  D. 

(dB) 


I  5  0  S. D. 


a  S.  D. 
(dB)  (dB)  (dB)  (dB) 


0.17  -0.  47  0.  56  0.88  0.  37 


0.24  2.02  0.  19  1  .  1  8  0.  56 


0. 62  0.  91  1.01  1  .  18  0.  53 


0.99  -4.72  0. 65  1.07  0.30 


0.69  -1.44  1.  14  1.56  0-  28 


Identification  runs  were  also  collected  over  a  period 
of  several  months  as  a  result  of  the  selective  adaptation 
and  binaural  studies  described  later  in  Chapter  6.  In 
particular,  subjects  DS  and  JH  performed  ever  40  such 
identification  tests,  and  plots  of  the  values  of  l5Q  for 
these  subjects  over  the  testing  period  are  shown  in  Fig. 

3.  10.  It  can  be  seen  that  the  two  subjects*  boundaries 
changed  slowly  over  this  time  period.  The  reason  for  this 
change  is  not  clear,  but  most  of  these  data  were  collected 
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.10.  Long-term  stability  of  the  category  boundary  for  two  subjects 
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during  preliminary  identification  runs  to  the  selective 
adaptation  experiments  described  in  Chapter  4.  Similar 
stability  was  observed  for  the  other  subjects,  who  also 
contributed  identification  data  spaced  over  several  months. 

The  fact  that  the  subjects  show  session- to- session  and 
run-to-run  variability  of  I5  which  is  less  than  the 
differ€nces  separating  the  subjects  (see  Table  3-1)  suggests 
that  both  the  boundary  location  (i.e.,  I50  )  as  well  as  some 
of  the  fluctuation  in  the  boundary  may  be  ph ysio logically 
determined.  The  trends  observed  in  Fig.  3.10  also  suggest 
this.  On  the  other  hand,  since  some  variability  is 
observed,  response  bias  is  the  most  likely  cause  of  the 
sess ion-to-session  variations.  To  obtain  an  estimate  of  the 
possible  influence  of  response  bias,  an  informal  experiment 
was  conducted  using  two  experienced  subjects.  The 
identification  test  as  described  above  was  carried  out 
twice:  the  first  time,  the  subjects  were  required  to  respond 
Mdn  to  a  given  stimulus  either  if  it  was  a  clear  /d/,  or  if 
they  thought  it  was  a  boundary  /b/.  On  the  second  run,  the 
reverse  task  was  demanded:  they  were  only  to  respond  /d/  if 
the  stimulus  was  a  non-boundary  /d/.  The  results  for  these 
two  subjects  are  shown  in  Fig.  3-11.  The  amount  of  boundary 
shift  is  quite  large  (3  dB  for  subject  DS  and  6.5  dB  for 
subject  JH) ,  and  it  appears  that  a  change  in  response  bias 
may  be  sufficient  to  account  for  the  large  subject 
differences  observed  in  Fig.  3-9.  However,  it  must  be 
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Fig.  3.11.  Extreme  limits  of  the  /bae/-/dae/  identification 
curves  for  two  subjects 
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pointed  out  that  this  experiment  did  not  constitute  an 
ordinary  identification  test.  Some  stimuli  were  called  /b/ 
when  it  was  perfectly  obvious  that  under  normal 
circumstances  it  would  have  been  called  /d/  (and  vice 
versa)  .  Thus,  the  changes  in  boundary  shown  in  Fig.  3.10 
are  much  greater  than  the  variability  observed  between 
sessions  (see  Fig.  3.9  and  Table  3-1).  It  is  undoubtely 
true,  however,  that  some  of  the  session- to-session 
variability  is  caused  by  a  change  in  response  bias. 

However,  some  of  the  variability  must  be  based  on 
physiological  mechanisms  over  which  the  subject  has  no  overt 
control. 

Hearing  differences  between  subjects  may  also  account 
for  intersubject  differences.  Audiograms  (Appendix  A)  show 
that  two  of  the  subjects  (DS  and  PA)  have  regions  of  low 
spectral  sensitivity,  but  these  dips  are  for  the  left  ear 
(most  of  the  experiments  were  conducted  using  right  ears 
only) ,  and  only  at  frequencies  greater  than  4000  Hz-  Since 
the  audiograms  only  show  sensitivities  at  seven  frequencies, 
no  conclusive  statements  can  be  made  regarding  the  effects 
of  reduced  spectral  sensitivity  in  the  vicinity  of  F^  ,  but 
since  some  of  the  observed  differences  are  of  the  order  of  5 
dB  or  core,  it  is  possible  that  hearing  differences  could 
account  for  some  of  the  differences  in  the  subjects’ 
boundaries.  But,  since  no  significant  difference  was  found 
between  ears,  and  the  audiograms  show  ear  differences  as 
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pronounced  as  the  subject  differences,  this  contribution  may 
be  minimal. 

3.2.3  EXPERIMENT  2:  Identification  of  /bet/-/det/ 

To  test  the  generalizabilty  of  the  /b/-/d/ 
categorization  with  regard  to  following  vowel8,  tokens  of 
/bet/  and  /det/  were  recorded  by  subject  JH.  (The  change  of 
speaker  was  to  test  the  robustness  of  the  effect  with  regard 
to  articulatory  idiosyncr acies ) .  The  recording  procedure 
and  stimulus  preparation  was  carried  out  for  these  stimuli 
as  described  above  in  Section  3.2  for  the  /bae/-/dae/  pair. 
The  presentation  paradigm  was  in  all  respects  identical  to 
that  described  in  Experiment  1  above.  The  same  five 
subjects  participated,  and  each  subject  was  run  only  once 
(right  monaural). 

Fig.  3.12  shows  the  results  of  identification  runs  for 
five  subjects  using  the  stimulus  pair  /bet/-/det/.  It  can 
be  seen  that  the  slope  of  the  transition  region  is  similar 
to  that  obtained  for  the  /tae/-/dae/  pair  (Fig.  3.9  above). 
Curiously,  it  appears  that  the  relative  positions  of  I50  are 
approximately  the  same  for  the  five  subjects  as  for  the 
/bae/-/dae/  experiment  although  the  spread  of  boundaries  is 


8  The  effect  was  originally  discovered  using  a  pair  of  /bi/- 
/di/  stimuli. 


PERCENT  JUDGED  /ra/  PERCENT  JUDGED  /btt/ 
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Id  (dB) 


Fig.  3.12.  Identification  curves  for  /bet/-/det/ 


Fig.  3.13.  Identification  curves  for  /ra/-/la/ 
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considerably  reduced.9  This  finding  is  interesting,  since  it 
suggests  that  differences  in  hearing  sensitivity  may  be 
responsible  for  the  boundary  placement.  (The  formant 
transitions  for  /bet/  and  /det/  are  ordered  similarly  to 
those  for  /bae/  and  /dae/) . 

Originally,  it  was  hypothesized  that  the  subject 
differences  could  be  due  to  differential  sensitivities  to 
the  various  acoustic  cues  which  are  responsible  for  /b/  and 
/d/  recognition.  It  was  felt  that  tokens  of  /b/  and  /d/ 
from  a  different  speaker  and/or  with  a  different  following 
vowel  might  affect  subjects  differently.  Inasmuch  as  only  a 
single  identification  run  was  obtained  from  each  subject, 
the  present  results  indicate  that  this  is  not  the  case. 

3.2.4  EXPERIMENT  3:  Identification  of  Liquids  and  Vowels 

Tc  test  whether  a  categorized  continuum  could  be 
obtained  with  combinations  formed  from  other  than  stop 
consonants,  a  /ra/-/la/  pair  was  constructed  from  a  set  of 
/ra/-/la/  tokens  recorded  by  subject  JH.  The  identification 
experiment  (one  run  per  subject)  was  carried  out  using  the 
same  presentation  program  and  subjects  as  for  Experiments  1 
and  2.  The  results  are  shown  in  Fig.  3. 13.  Although  the 


9  Less  mutual  interference  was  noticeable  for  the  /bet/- 
/det/  stimuli  than  for  the  /bae/-/dae/  stimuli,  which  may  be 
the  reason  for  the  smaller  intersubject  variation  in  I5Q. 
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transition  regions  appear  as  steep  as  in  the  stop  consonant 
test  (indicating  a  strong  categorization) ,  the  subjective 
impression  in  this  case  was  slightly  different.  Whereas  in 
the  /b/-/d/  experiments  little  mutual  interference  of  the 
two  sounds  was  evident,  in  the  /r/-/l/  case  there  was 
somewhat  of  a  tendency  to  perceive,  say,  an  /r/  with  a  low 
intensity  /l/  in  the  background.  Two  subjects  reported  that 
additional  '‘fusions"  were  heard:  subject  GM  reported  hearing 
both  /bra/  and  /b la/  and  subject  GR  reported  hearing  /bla/. 
No  such  fusions  were  reported  by  the  other  subjects.  In 
summary,  the  results  of  the  /ra/-/la/  experiment  appear  to 
parallel  the  findings  of  /ra/-/la/  categorization  by  varying 
the  F2  -F,  composition  of  synthetic  stimuli  (Miyawaki  et  al.  , 
1975) . 


Considering  that  vowels  are  perceived  less 
categorically  than  stop  consonants  (Fry  et  al.,  1962;  but 
note  Repp  et  al.,  1978;  Pisoni,  1973;  Fujisaki  and 
Kawashima,  19  69),  it  was  decided  to  use  the  same 
experimental  paradigm  to  test  whether  or  not  a  vowel 
continuum  could  be  created.  An  informal  test  showed  that  a 
continuum  could  indeed  be  created,  but  that  perception  was 
continuous  rather  than  categorical.  Two  vowels  /i/  and  /ae/ 
with  approximately  equal  f  *s  were  combined  according  to 


V  ’  = 


a/i/  +  (I-*  a)  /ae/ 


(3-10) 
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Both  vowels  could  always  he  heard  simultaneously.  For 
values  of  a  >  0.5,  the  dominant  percept  was  /i/  with  a 
simultaneous  but  weaker  /ae/  percept.  The  converse  occurred 
for  low  values  of  a  .  The  only  stimulus  combination  for 
which  the  physically  less  intense  vowel  could  not  be  heard 
was  when  a  =0  or  a=1.  Since  the  step  size  was  a  =0.05, 
this  meant  that  the  weaker  vocalic  percept  was  audible  down 
to  at  least  25  dB  below  the  stronger  vocalic  percept.  This 
clearly  demonstrated  that  little  or  no  categorization  was 
occurring,  and  this  line  of  investigation  was  not  pursued 
further.  However,  mixing  of  vowels  in  this  fashion  is  a 
topic  worthy  of  investigation  for  its  own  sake. 

3.2.5  EXPEBIMENT  4:  Identification  of  /ba/-/da/-/ga/ 

A  preliminary  experiment  showed  that  /ba/-/da/-/ga/ 
combinations  produced  a  triply  ambiguous  signal.  When  a 
stimulus  with  approximately  equal  proportions  of  /b a/,  /da/ 
and  /ga/  was  played  back  repeatedly,  it  was  possible  to 
perceive  any  of  the  three  phonetic  categories.  Thus,  in 
spite  of  the  degree  of  spectral  confusion  which  had  to  be 
occurring,  it  was  clear  that  a  three-way  analogue  of  the 
/ba/-/da/  experiment  was  possible.  If  the  /ba/,  /da/  and 
/ga/  components  are  orthogonal,  then  a  signal  mixture  given 
by 


s*  =  as,  +  3s, 
b  a 


g 


+  y  s 


(3-11) 
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(where  a  +  3  +  y  =  1)  would  result  in  a  three-dimensional 

signal  space  as  shown  in  Fig.  3.14.  This  plane  should  be 
divided  into  three  regions,  with  a  triply  ambiguous  point  at 
the  intersection  of  the  three  phonetic  boundaries.  The 
degree  of  orthogonality  of  the  three-signal  mixture  will  be 
reflected  by  the  amount  of  interference  along  the  phonetic 
boundaries. 

A  sequence  of  /ba /,  /da/  and  /ga/  stimuli  were  recorded 
by  subject  JH  and  the  formant  transitions  for  the  /b/,  /d/ 
and  /g/  were  separately  stored  in  a  disk  file  along  with  the 
extracted  steady-state  vowel  from  the  /ba/.  The  /da/  and 
/ga/  were  aligned  to  the  /ba/  as  previously  described  in 
Section  3.1.  A  randomized  file  consisting  of  a  triangular 
design  (a.  ,  3.  ,  y .)  was  created,  where  a.  +  3.  +  y.  =  1  The 

3  i  i  l  i  i  i 

scaling  factors  a.,  3.  and  y.  were  varied  in  steps  of  0.05. 

ill 

This  resulted  in  231  combinations  of  0  <a.  ,  3-  r  y.  <1. 

i  i  i 

The  stimuli  were  presented  as  in  Experiment  1,  except  that  a 
three-way  response  was  required.  Three  push-buttons  were 
provided,  labelled  /ba/,  /da/  and  /ga/.  The  stimuli  were 
presented  in  blocks  of  25,  and  on  each  presentation  the 
subject  was  required  to  press  the  appropriate  switch. 

During  each  run,  each  stimulus  combination  (  ou  ,  3  ,  y^  ) 

occurred  only  once.  Two  subjects,  JH  and  DS  were  run  each  a 
total  cf  10  times,  resulting  in  10  judgments  for  each  of  the 


231  stimuli. 
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Fig.  3.14.  Division  of  hypothetical  signal  space  into  regions 
of  /b/,  / d/  and  / g/  identification 
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The  results  for  the  two  subjects  are  shown  in 
Fig.  3.15.  The  three  symbols  represent  the  modal  values  of 
the  10  judgements  at  each  point.  The  clear  division  into 
three  distinct  regions  with  only  minor  irregularities  around 
the  boundaries  between  categories  (and  the  fact  that  the 
boundaries  lie  along  a  line  passing  through  the  triple  point 
and  the  opposite  vertex)  indicates  that  the  /b/r  /d/  and  /g/ 
components  of  the  stimuli  were  substantially  orthogonal. 

Only  the  /g/-/b/  boundary  shows  a  disturbance,  as  might  be 
expected  on  the  basis  of  spectral  confusions  (Cutting, 

1976) .  Ey  and  large,  the  stimuli  are  perceived  as  clear 
exemplars  of  either  /ba/,  /da/  or  /ga/,  except  in  the 
vicinity  of  the  "triple-point"  where  considerable  noisiness 
was  evident  (more  than  was  observed  in  Experiment  I).10 

Fig.  3.16  shows  identification  runs  which  correspond  to 
slices  through  Fig.  3.15  for  subject  DS  for  constant  values 
of  y  (i.e.,  increasing  /g/  component).  It  is  seen  that 
for  y  <  0.33  (approximately),  little  influence  of  the  /g/ 
component  is  observed  on  the  /b/-/d/  boundary.  These  /ba/- 
/da/  identification  curves  show  transition  regions  with  a 


This  experiment  was  originally  attempted  using  a  /bae/- 
/dae/-/gae/  triple  similar  to  the  /bae/-/dae/  pair  used  in 
Experiment  1.  However,  after  several  runs  had  been 
conducted  it  became  apparent  that  the  /gae/  could  be  easily 
identified  by  its  slight  glide,  /cyae/.  To  eliminate  this 
glide  -  which  appeared  to  be  an  artefact  of  articulation 
which  additional  recording  attempts  did  not  eliminate  - 
subject  JH  recorded  tokens  of  /ba/-/da/-/ga/ ,  for  which  the 
glide  was  less  noticeable. 
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A  /da/ 

A  A 
AAA 
A  A  A  A 
A  A  A  A  A 
A  A  A  A  A  A 
A  A  A  A  A  A  A 
AAAAAAAA 

(a)  AAAAAAAAA 

AAAAAAAAAA 
•A A A AAA AAA A 
•  • # AAAAAAAAA 
•  ••••AAAAAAA  • 

•••••••AAA*  • • • 

•  •••••••A . 


/ga/ 


*  /ba/ 


A  /da/ 

A  A 
AAA 
A  AAA 
A  AAA  A 
A  A  A  A  A  A 
AAAAAAA 
AAAAAAAA 
AAAAAAAAA 
AAAAAAAAAA 
AAAAAAA  AAAA 
•••AAAAAAA*  • 

••••AAAA . 

»•••••  A . 


/ga/ 


•  /ba/ 


Fig.  3.15.  Results  of  the  /ba/ -/da/-/ga/  identification  tests 
for  two  subjects.  (a)  subject  JH  (b)  subject  DS 


PERCENT  JUDGED  /b/,/d/  OR //g/ 
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7  =  0.45 
(e) 


Fig.  3.16.  /ba/,  /da/  and  /ga/  identification  curves  as  a  function 

of  increasing  /ga /  component  (y) 
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Fig. 


\ 


.17.  Graphical  interpretation  of  the  results  shown  in  Fig.  3.16. 
Compare  (a)  with  Fig.  3.15  and  (b)  with  Fig.  3.16(d) 
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slope  approximately  the  same  as  previously  observed  in 
Experiments  1  and  2,  so  the  addition  of  the  /g/  component 
(with  a  simultaneous  reduction  in  both  /b/  and  /d/ 
intensities)  produces  little  interference  until  it  is 
approximately  as  intense  as  the  /b/-/d/  mixture.  Although 
it  is  difficult  to  tell  from  these  curves  (since  none  of 
them  represent  a  slice  parallel  to  the  phonetic  boundaries), 
there  does  appear  to  be  an  increase  in  the  number  of  /d/ 
responses  at  the  /b/-/g/  boundary.  The  curious  shape  of  the 
sections  of  constant  /g/  in  the  vicinity  of  the  triple  point 
is  primarily  due  to  the  fact  that  two  phonetic  boundaries 
are  being  intersected  (compare  Fig.  3.17  with  Fig.  3„16d). 

3. 2. 6  Summary  of  Identification  tests 

The  identification  tests  of  experiments  1  through  4 
were  trivial  for  all  subjects,  and  little  improvement 
(reduction  in  slope  of  the  transtion  region)  was  observed 
with  experience  of  the  subjects.  Several  linguistically 
naive  subjects  were  also  tested,  and  yielded  identification 
data  comparable  in  all  respects  to  those  shown  in  Fig.  3.9. 
All  identification  curves  were  sigmoidal,  and  rarely 
anything  but  strictly  monotonic.  Deviations  from  100 
percent  recognition  of  non-boundary  stimuli  usually  could  be 
attributed  to  erroneous  switch  pressing  or  distraction/day¬ 
dreaming  of  the  subject.  Boundary  stimuli  were  in  general 
easy  to  label  as  belonging  to  one  category  or  the  other. 


Ill 


although  increased  noisiness  was  evident  for  these  stimuli. 
Repeated  playback  of  a  boundary  stimulus  creates  a  curious 
effect.  It  i s  possible  to  hear  either  /bae/  or  /dae/  from 
such  a  stimulus,  as  the  listener  so  chooses.  It  is  not 
possible,  however,  to  hear  both  simultaneously.11  The  effect 
can  be  likened  to  an  "auditory  Necker  cube"  phenomenon, 
where  either  of  two  forms  can  be  perceived,  but  never  both 
simultaneously.  The  effect  persists  even  when  three 
stimuli,  /ba/,  /da/  and  /ga/  are  combined  (Experiment  5 
below).  The  binaural  counterpart  of  this  phenomenon  has 
been  ncted;  Ades  (1974),  investigating  simultaneous  dichotic 
adaptation  points  out  that 

"...  it  is  worth  mentioning  that  when  /bae/  is 
presented  to  one  ear  and  /dae/  to  the  other  at  the 
same  time,  the  subject  will  hear  a  single  fused 
percept.  The  fused  percept  may  be  heard  as  either  a 
/bae/  or  /dae/,  but  it  is  guite  impossible  to  hear 
the  two  inputs  as  separate  entities."  (p.  612) 


This  is  not  what  would  be  expected  if  the  effect 
depended  solely  on  masking.  The  almost  complete  suppression 
of  the  weaker  phonetic  percept  suggests  that  something  more 
is  involved.  If  simultaneous  masking  were  the  dominant 
psychophysical  process,  it  is  conceivable  that  the  influence 


ii  Two  of  the  subjects  with  extensive  phonetic  training 
noted  that  stimuli  in  the  transition  region  tended  to  be 
slightly  asynchronous  (/  dae/).  This  slight  asynchrony  was 
not  noted  for  the  /bEt/-/det/  stimuli. 


of  the  masker  should  grow  steadily  with  its  intensity, 
rather  than  abruptly  as  was  found  in  Experiments  1,  2  and 
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4.  In  actual  fact,  interference  is  only  noticeable  when  one 
of  the  components  is  within  a  few  dB  of  the  other,  i.e.,  in 
xhe  transition  region  between  the  categories.  Even  then, 
the  effect  is  one  of  added  noise,  not  one  of  simultaneity  of 
percepts.  To  further  investigate  the  importance  of  masking 
in  the  categorization  of  the  a  continuum,  an  experiment  was 
conducted  using  a  masker  which  was  acoustically  similar  to 
both  /b/  and  /d/,  but  which  was  phonetically  distinct. 

3.2.7  EXPERIMENT  5:  Masking  by  a  Vocalic  Masker 

A  masking  stimulus  was  created  by  replicating  one  of 
the  pitch  periods  of  the  steady-state  vowel  /ae/  in  order  to 
produce  a  signal  which  provided  a  similar  spectro- tempor al 
structure  in  the  vicinity  of  the  formant  transitions  of  the 
/b/  and  /d/.  The  amplitude  envelope  of  the  /ae/  masker  was 
modified  so  that  the  intensity  (  ^s^2)  of  each  of  the  pitch 
periods  of  the  masker  was  identical  to  the  corresponding 
pitch  period  of  the  /b/.  This  signal  was  aligned  to  the  /b/ 
transitions  by  the  procedure  described  in  Section  3.2.  The 
resulting  signal,  when  concatenated  to  the  steady-state 
vowel  from  which  it  was  obtained,  produced  a  natural 
sounding  /ae/.  This  masker  was  combined  with  the  /b/  and 
/d/  formant  transitions  to  create  two  series  of  stimuli: 
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b*  =  a/b/  +  (1-  a)/ae/ 


(3- 12a) 


and 


d*  =  a/d/  *  (1-  a)/ae/ 


(3- 12b) 


Forty- two  stimuli  were  used,  twenty-one  each  of  b*  and  d*. 
Thus,  a  =  0  corresponded  to  a  /ae/  for  both  sets  of 
stimuli.  These  were  included  to  obtain  a  measure  of  the 
response  bias  in  favour  of  either  /b/  or  /d/.  The  two 
series  of  stimuli  were  presented  in  random  order  in  a  fully- 
crossed  design.  All  aspects  of  the  stimulus  presentation 
were  identical  to  that  of  Experiment  1.  The  subjects*  task 
was  to  identify  each  stimulus  as  /bae/  or  /dae/.  The 
subjects  were  fully  informed  as  to  the  nature  of  the 
experiment  and  were  asked  to  try  as  hard  as  they  could  to 
identify  whichever  stimlus  (/b/  or  /d/)  was  being 
presented.  It  was  specifically  pointed  out  that  the  /b/  and 
/d/  stimuli  occurred  equally  often. 

The  number  of  /b/  and  /d/  responses  as  a  function  of 
were  calculated  and  plotted  in  Fig.  3.18.  (The  I  scale  for 
each  subject  was  adjusted  by  I50  dB,  i-e.. 


(3-13) 


where  I5Q  is  the  average  boundary  as  determined  from 


PERCENT  CORRECT 
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la 


Fig.  3.18.  /b/  (heavy  solid  line)  and  / d/  (light  solid  line) 

identification  curves  in  the  presence  of  masking 
by  /ae/.  The  corresponding  /bae/-/dae/  identification 
curve  is  shown  as  the  dashed  line 

I 
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Experiment  1).  For  purposes  of  comparison,  the  average  /b/- 
/d/  identification  curve  from  Experiment  1  is  also  shown. 

The  results  show  that  the  /ae/  "transitions”  masked  the  /b/ 
and  /d/  transitions  to  an  extent  which  monot onically 
increased  with  the  strength  of  the  /ae/  masker  (compare  with 
Fig.  3.S).  Furthermore,  100  percent  identification  of  both 
/b/  and  /d/  occurred  10  dB  or  so  before  the  /bae/-/dae/ 
boundaries  obtained  in  Experiment  1,  which  indicates  that 
the  mutual  masking  of  the  /b/  and  /d/  is  more  effective  than 
that  of  the  /ae/  masker.  This  is  not  what  would  be  expected 
if  spectral  masking  were  the  only  factor  involved. 

One  possibility  is  that  some  level  of  phonetic-  level 
inhibition  is  occurring,  in  which  the  outputs  of  some 
phonetic  processors  interact  in  such  a  way  that  only  the 
strongest  excitation  is  perceived.  This  is  consistent  with 
the  notions  of  "rapid  encoding"  cf  consonantal  cues  (Pisoni, 
1973,  1975;  Fujisaki  and  Kawashima,  1  969).  It  may  be  that 

the  abrupt  change  of  percept  is  related  to  the  fact  that  the 
acoustic  cues  for  /b/  and  /d/  develop  over  the  same  amount 
of  time,  in  which  case  some  form  of  mutual  interaction 
during  the  processing  of  these  two  signal  components  may 
account  for  the  apparent  inability  of  a  subject  to  perceive 
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both  percepts  simultaneously .  *  2 


3 . 3  ABX  AND  AX  DISCR IMINA1I0N  TESTS 

lie  identification  results  of  Experiments  1  as  well  as 
informal  observations  indicated  that  the  /b/-/d/ 
combinations  were  strongly  categorized.  To  test  whether  or 
not  categorical  perception  was  occurring  (according  to  the 
contemporary  criteria  set  forth  by  Studdert- Kennedy  et  al. , 
1970) ,  a  conventional  ABX  discrimination  paradigm  was 
performed  using  the  /bae/-/dae/  stimuli. 

3.3.1  EXPERIMENT  6:  ABX  Discrimination 

Twenty-one  stimuli  were  prepared  by  computing  the 
various  combinations  for  a  ranging  from  0  to  1  in  steps  of 
0.05,  and  each  mixture  was  then  scaled  by  1//l» ,  where  I*  is 
given  by 


I'  =  1.00  -  1.36a  +  1.36a2  (3-14) 


12  An  informal  experiment  was  conducted  with  a  mixture  of 
/bae/  and  /rae/.  In  this  case,  a  transition  from  /b/  to 
/br/  to  /r/  was  observed,  similar  to  phonological  fusions  in 
dichotic  studies  (see  Cutting,  1974,  1976).  This  supports 
the  notion  that  the  interaction  between  /b/  and  /d/  depends 
in  part  on  their  acoustic  similarity. 
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This  scaled  all  signal  combinations  to  the  same  overall 
intensity. 13  These  21  stimuli  were  stored  in  a  disk  file  and 
accessed  by  the  presentation  program  described  below. 

The  experimental  setup  for  the  discrimination 
experiment  was  as  described  for  Experiments  1  through  4. 

For  each  presentation,  two  stimuli  A  and  B  which  were 
separated  by  Aa.  =  0.1  (i-e.,  two  steps)  were  selected 

according  to  a  file  of  randomized  numbers.  The  third 
stimulus,  X,  of  the  ABX  paradigm,  was  always  either  A  or  B. 
The  interval  between  A  and  B,  and  B  and  X,  was  750 
milliseconds.  Subjects  were  reguired  to  press  switch  1  if 
the  third  stimulus  was  judged  to  be  the  same  as  the  first 
stimulus,  or  switch  2  if  it  was  the  same  as  the  second 
stimulus.  Three  subjects  participated  in  this  test,  with 
each  subject  repeating  the  discrimination  run  five  times. 
From  the  five  runs,  40  judgements  of  the  discriminabilty  of 
each  pair  of  stimuli  were  obtained. 

The  results  of  the  ABX  discrimination  task  for  the 
three  subjects  are  shown  in  Fig-  3.19.  The  data  for  each 
subject  have  been  normalized  to  a  common  category  boundary 
using  Equation  3-13. 


13  A  preliminary  run  showed  that  the  overall  intensity 
difference  of  the  composite  stimuli  as  a  function  of  the 
weighting  parameter  a  (see  Fig.  3.4)  aided  discrimination. 
Scaling  by  1//T1  eliminates  this  variation  with  a  . 
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ig.  3.19.  ABX  discrimination  curves  for  the  three  subjects. 

The  solid  lines  are  calculated  from  Equation  3-1 
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Inasmuch  as  all  three  subjects  had  by  this  time  become 
very  acquainted  with  this  set  of  stimuli,  discrimination  was 
a  very  difficult  task,  and  stimuli  could  be  judged 
"different"  with  relative  confidence  by  a  subject  only  in 
approximately  5  percent  or  fewer  cases.  This  meant  that 
most  of  the  time  the  subject  was  required  to  attempt 
discriminations  between  stimuli  for  which  (or  so  he  felt)  he 
could  not  perceive  any  difference.  This  made  the  task  very 
difficult  for  the  subjects,  for  it  was  difficult  for  them  to 
know  when  their  responses  were  at  all  systematic.  The 
results  show  considerable  statistical  scatter  even  when  five 
runs  are  averaged,  but  nonetheless  show  enhanced 
discr iminability  in  the  vicinity  of  the  phonetic  boundary. 
For  all  three  subjects,  the  discrimination  curves  asymptote 
in  the  wings  ("troughs",  but  "wings"  is  mere  appropriate)  to 
chance  level  (50  percent). 

Discrimination  data  are  typically  compared  with  the 
Haskin*s  model  (Eguation  3-1)  and/or  the  Fujisaki  and 
Kawashima  model  (Equation  3-3)  to  determine  whether  or  not 
the  continuum  is  categorically  perceived.  In  the  present 
instance,  only  the  Haskin's  model  can  be  applied  since  the 
discrimination  curves  must  go  to  chance  level  at  the 
endpoints.  This  follows  from  the  way  in  which  the  stimuli 
are  constructed.  Near  a  =  0  or  a  =  1,  the  stimuli  become 
increasingly  pure,  and  hence  discrimination  must  go  to 
zero.  The  labelling  probabilities,  p  (Ia  ) ,  were  calculated 
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from  the  normal  ogives  determined  from  Experiment  1-  { p ( I  ) 

a 

is  the  probability  of  identifying  a  stimulus  specified  by 
as  a  /b/) .  The  average  value  of  a  (  d  =  1.2  dB)  was  used 
for  this  calculation.  The  predicted  functions  are  shown  as 
the  solid  lines  in  Fig.  3.19.  For  subjects  DS  and  GE,  the 
enhancement  of  discriminability  is  roughly  that  predicted  by 
the  model.  Not  much  can  he  claimed  about  the  guality  of 
fit,  since  even  with  five  runs  the  data  have  not 

stabilized.  14  Nonetheless,  a  strong  peak  in  the  vicinity 
of  the  category  boundary  is  evident.  The  data  for  subject 
Jh  show  a  very  poor  fit,  with  the  measured  discriminability 
being  much  greater  than  that  predicted  by  the  Baskin' s 
model.  This  is  evidently  due  to  the  fact  that  this 
particular  subject  chose  to  reduce  the  AEX  test  to  an  AX 
test  (see  below) . 

3.4  AX  DISCRIMINATION 

Various  difficulties  were  encountered  with  the  ABX 
paradigm  described  above.  First  of  all,  the  task  was  very 
difficult.  It  was  felt  by  all  three  subjects  that  for  the 
most  part  responses  were  being  given  randomly.  While  this 
may  not  be  important  for  the  paradigm,  it  is  important  for 

The  data  for  these  three  subjects  cannot  be  pooled  since 
their  identification  boundaries  are  separated  by  several 
dB.  The  transformation  IQ'  =  4  -  I  causes  the  data 

points  to  correspond  to  different  values  of  1^  . 
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the  subjects  concentration,  since  it  is  difficult  for 
subjects  to  respond  consistently  when  no  basis  for 
consistency  can  be  perceived.  Second,  it  was  apparent 
during  the  ABX  discrimination  experiment  that  subtle 
differences  in  the  stimuli  which  were  noticeable  during  an 
identification  test  were  no  longer  perceivable.  Third,  it 
was  discovered  that  subjects  JH  and  GE  had  effectively 
short-circuited  the  ABX  paradigm.  According  to  their 
confessions,  they  chose  to  monitor  only  to  the  last  two  (B 
and  X)  stimuli,  and  if  they  were  different,  they  responded 
as  if  the  first  and  third  stimuli  were  identical.  Likewise, 
if  the  last  two  stimuli  were  judged  to  be  the  same,  they 
responded  as  if  the  second  and  third  stimuli  were  the  same. 
(Only  the  author,  subject  DS,  naively  attempted  to  compare 
all  three  stimuli).  Inasmuch  as  the  nature  of  the  task  had 
been  explained  to  the  subjects  at  the  outset,  they  cannot  be 
faulted  for  having  chosen  to  make  life  easy  for  themselves. 
It  just  stresses  one  of  the  prime  disadvantages  of  the  ABX 
paradigm,  i.e.,  that  there  is  more  than  one  possible  subject 
strategy  (see  MacMillan  et  al- ,  1977;  Pollack  and  Pisoni, 
1971;  Pierce  and  Gilbert,  1958,  for  other  possible  subject 
strategies) . 

Tc  overcome  the  difficulties  of  the  ABX  paradigm,  and 
to  obtain  data  to  test  the  AX  discrimination  model  presented 
in  Chapter  2.  3,  discrimination  testing  was  continued  using  a 
fixed-standard  AX  paradigm  similar  to  that  used  by  Carney  et 
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al.  (1S77).  The  comparison  between  the  STD  model  and  the 
phonetic  memory  model  indicates  a  number  of  possibilities 
for  testing  the  relative  adequacy  of  these  two  models  within 
the  AX  testing  paradigm.  The  behaviour  of  the  STD  model  as 
a  function  of  the  observer  criterion  is  considerably 
different  from  that  of  the  phonetic-memory  model,  and 
consequently  the  change  in  discrimination  scores  with 
changes  of  observer  criterion  is  of  considerable  interest  in 
the  verification  of  the  model. 

3.4.1  EXPEEIMENT  7:  AX  Discrimination  Scores 

The  same  stimuli  as  in  Experiment  5  were  used  in  a 
fixed-standard  AX  paradgim.  On  any  particular  run,  one  of 
these  stimuli  was  chosen  as  the  standard  against  which  all 
other  21  stimuli  were  compared.  The  order  of  the  standard 
and  test  stimuli  were  randomized,  and  the  interstimulus 
interval  was  set  at  500  milliseconds.  Eour  subjects  were 
employed,  each  providing  seven  individual  runs.  Only  a 
single  subject  was  tested  at  one  time  in  order  to  reduce  the 
number  of  distractions  in  the  environment.  The  subjects 
were  fully  informed  of  the  composition  of  the  standard 
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stimulus  and  the  proportion  of  true  same-same  contrasts.  15 

The  first  run  was  used  as  a  training  run  to  familiarize 
the  subject  with  the  experimental  task.  This  introduction 
to  the  test  was  necessary  to  ensure  that  the  subject  clearly 
understood  what  was  meant  when  he  was  asked  to  keep  the  same 
criterion  both  within  and  between  runs.  All  subjects  chose 
a  fairly  lax  criterion  for  this  first  run,  attending  mostly 
to  phonetic  differences.  On  later  runs,  they  were 
instructed  to  adopt  either  stricter  or  more  lax  criteria. 


Each  subject  was  run  at  least  a  total  of  seven  times. 
Cn  six  of  these  runs,  three  different  standards  were  used: 
a  =0  (pure  /dae/) ,  a  =  1  (pure  /bae/) ,  and  a  value  of 
close  to  the  subjects  phonetic  boundary  as  determined  by  a 
preliminary  identification  run.16  Two  replications  were 
obtained  for  each  standard,  and  the  subject  was  instructed 
to  try  to  maintain  the  same  criterion  as  he/she  had  used  on 
the  previous  run.  It  was  suggested  to  the  subjects  that  in 


15  Since  the  objective  was  to  obtain  information  on  the 
effect  of  observer  criterion  on  tie  AX  d iscrimination 
curves,  there  was  no  point  in  raising  the  number  of  AA 
contrasts  to  50  percent,  as  done  by  Carney  et  al.  (1977). 
Also,  a  change  in  observer  criteria  thus  affects  not  only 
the  number  of  false  alarms  (AA  contrasts  judged 
"different”),  but  also  changes  the  number  of  "different" 
judgements  across  the  entire  continuum  (see  2.11). 

16  It  was  not  possible  to  use  the  exact  boundary  stimlus  as 
a  standard  for  all  subjects  since  (a)  this  is  not  an  exactly 
definable  value  of  a  and  (b)  only  21  stimli  in  steps  of 

Act  =0.05  were  available.  The  value  of  a  closest  to  that 
subject’s  average  boundary  was  used. 
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order  to  help  maintain  the  same  criterion,  roughly  the  same 
proportion  of  "same"  and  "different"  judgements  should  be 
maintained.  Each  standard  ( I s )  was  replicated  before  a  new 
standard  was  chosen. 

Testing  was  carried  out  on  two  or  more  days,  with 
replications  always  being  performed  on  the  same  day. 
Approximately  15  minutes  to  one-half  hour  was  allowed 
between  runs  in  order  to  minimize  adaptation  effects  which 
might  occur  from  repeated  presentation  of  the  standard  (see 
Simon  and  Studdert^Kennedy,  1978).  On  the  last  run,  the 
standard  was  selected  as  a  =  0,  and  the  subject  was  asked 
only  to  respond  to  phonetic  differences.  Several  additional 
runs  were  obtained  from  subject  JH. 

The  results  for  the  five  subjects  are  shown  in 
Fig.  3.20.  The  influence  of  observer  criterion  is  as 
predicted  by  the  AX  discrimination  model  derived  in  Section 
2.3.  For  a  lax  criterion  (only  phonetic  dif ferences) ,  the 
AX  discrimination  curve  coincides  with  the  identification 
function  as  both  the  phonetic-memory  and  SID  models 
predict.  As  the  same-different  criterion  is  tightened  so 
that  finer  differences  are  responded  to,  the  discrimination 
curve  is  still  sigmoidal,  but  its  slope  increases  and  it 
shifts  towards  whichever  endpoint  stimulus  is  the  standard. 
As  the  criterion  is  tightened  further,  the  number  of  within- 
categcry  responses  (and  false  alarms)  increases,  also  as 
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Fig.  3.20.  AX  discrimination  results.  The  standard  stimulus  is 

indicated  in  each  graph  by  the  arrow  on  the  horizontal 
axis,  and  also  in  the  legend  by  "I  =  "•  The  solid 
lines  represent  the  fitted  model 


127 


predicted.  Fig.  3.21  shows  the  individual  runs  for  the  pure 
/dae/  standard  stimulus  plotted  on  a  single  graph. 

Comparing  the  results  of  Fig.  3.21  with  Fig.  2.9  and  2.13, 
there  seems  no  doubt  that  the  SID  AX  discrimination 
adequately  describes  the  influence  of  the  observer 
criterion.  The  phonetic  memory  model  (Eguation  2-5)  can  be 
conclusively  rejected  since  it  obviously  does  not 
demonstrate  this  behaviour  (compare  Fig.  3.20  with 
Fig.  2.4). 

When  the  standard  stimulus  in  the  AX  discrimination 
task  is  a  boundary  or  near-boundary  stimulus,  the 
discrimination  function  shows  a  pronounced  dip  at  the 
location  of  the  standard,  as  is  predicted  by  the  STD  model. 
(For  this  test,  all  subjects  were  asked  to  respond  to  any 
differences  which  they  might  feel  existed,  bearing  in  mind 
that  approximately  5  percent  of  the  comparisons  would  be 
between  physically  identical  stimuli) .  The  symmetrical 
nature  of  these  curves  (Fig.  3.20b  and  3-20f)  indicates  that 
a  boundary  stimulus  is  perceptually  eguidistant  from  either 
endpoint.  This  supports  the  notion  that  phonetic 
differences  are  merely  "large  acoustic  differences"  insofar 
as  a  discrimination  test  is  concerned. 

It  cannot  be  inferred  directly  from  these  results  that 
categorical  perception  is  occurring,  or,  for  that  matter, 
that  it  is  not  occurring,  since  a  change  in  observer 
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Fig.  3.21.  Summary  of  AX  discrimination  curves  for  /dae/ 
standard  .  Compare  with  Fig.  2.11(b) 
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criterion  is,  to  a  first  approximation  at  least,  equivalent 
to  an  increase  or  decrease  in  discriminabili ty-  (Likewise, 
the  fact  that  the  ABX  curves  presented  in  Fig.  3.19 
asymptote  to  50  percent  does  not  imply  that  no  within- 
category  discrimination  is  possible.  It  can  equally  well 
mean  that  the  observer  chose  not  to  respond  to  slight 
acoustic  differences) .  To  test  for  categorical  perception, 
the  curve-fitting  model  of  AX  discrimination  derived  in 
Section  2.2  was  fitted  to  the  data. 

3.4.2  Fitting  the  AX  Discrimination  Data 

The  analysis  of  the  "dispersion  model'*  of 
discrimination  presented  in  Section  2.2  suggests  that  the 
results  of  Experiment  7  fit  to  the  model  if  a  suitable 
unimodal  dispersion  function  is  chosen.  (For  this 
particular  continuum  -  relative  intensity  -  it  is  not 
apparent  why  dispersion  should  be  unimodal,  but  the  results 
clearly  indicate  that  it  is)  .  A  convenient  function  to  test 
out  the  dispersion  function  is  a  Gaussian,  as  described 
previously  in  Section  2.2.  The  fitting  process  was  carried 
out  by  first  averaging  the  replicated  runs  for  each  of  the 
subjects,  and  then  adjusting  all  I  values  by  Equation  3-13 
so  that  the  identification  boundary  for  all  subjects 
occurred  at  I5 0  =  0  dB.17  This  reduces  the  Gaussian 


17  The  justification  for  this  is  provided  below. 
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dispersion  curve  to  a  single  free  parameter  since  the  mean 
is  then  positioned  at  I  =  0  for  all  subjects.  To  fit  the 
individual  di scrimination  profiles,  it  was  necessary  to 
allow  the  criterion  Ayc  to  be  fitted  to  each  discrimination 
run  independently. 


The  model  to  which  the  data  were  fitted  are  defined  by 
Equations  2-11,  2-13  and  2-15: 


pd  =  1  -  $ 
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The  fitting  process  consisted  of  varying  (a)  the  "width" 
(i.e.,  standard  deviation)  aDof  the  Gaussian  dispersion 
function  (b)  the  standard  deviation  a  of  the  random 
variable  Y,  and  (c)  the  criterion  Ay^  for  each  profile.  The 
data  were  fitted  by  computing  the  model  for  each  of  the 
profiles  and  computing  the  sum  of  squares  difference  over 
the  whole  data  set.  Each  of  the  parameters  was  incremented 
or  decremented  on  each  cycle  according  tc  whether  or  not  the 
sum  of  squares  difference  was  increasing  or  decreasing 
(basically  following  the  PEST  algorithm  of  Taylor  and 
Creelman,  1S66)  .  The  iteration  was  stopped  when  the  sum  of 
squares  difference  changed  by  less  than  0.05  percent.  The 
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resulting  model  is  shown  in  Pig.  3.22,  and  the  fitted 
discrimination  profiles  are  shown  as  the  solid  lines  in  Fig. 
3.20-  In  general,  the  fit  is  quite  good,  considering  the 
stability  of  the  data.  (More  than  two  replications  are 
obviously  necessary  to  stabilize  the  data.  Even  five  runs 
as  done  in  the  ABX  experiment  still  showed  large  scatter) - 
Some  of  the  differences  which  exist  are  attributable  to  the 
fact  that  the  values  of  I-50  choser  for  the  standard  stimulus 
were  based  on  the  average  I50  for  each  of  the  subjects  as 
determined  from  Experiment  1.  I50  may  be  different  by  as 
much  as  1  or  2  dB  on  any  particular  run.  As  Fig.  3-9  shows, 
the  effect  of  a  change  in  the  location  of  the  maximum  of  the 
dispersion  function  will  have  its  greatest  effect  in  the 
vicinity  of  a  subject’s  boundary.  Thus,  the  profiles  in 
Fig.  3.20  may  not  be  properly  ordered  with  respect  to  Is  . 
This  is  undoubtedly  the  case  for  subjects  DS  and  GR  (see 
Fig.  3.20i  and  3-201).  For  subject  DS  the  standard  stimulus 
was  a  clear  /dae/,  which  indicates  that  it  is  not  as  close 
to  the  boundary  as  Fig-  3-20i  would  imply.  Subject  GM,  on 
the  other  hand,  commented  that  she  felt  the  standard 
stimulus  must  have  been  close  to  the  boundary  since  she 
could  not  unambiguously  identify  it.  Her  boundary  run  (Fig. 
3. 20f )  shows  that  this  is  indeed  the  case.  Subject  JH,  for 
whom  two  runs  were  taken  which  spanned  his  normal  boundary, 
shows  a  reversed  symmetry,  as  would  be  expected  (see  Figs.  3 
.20b  and  3. 20c) - 
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Best  fit  for  the  AX  discrimination  model 

for  Ay  =0.2 
c 


Fig.  3.22. 
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The  normalization  of  the  various  subjects*  data  by 
shiftirg  the  discrimination  curves  by  an  amount  necessary  to 


align  the  boundaries  at  I5Q  =  0  dB  reguires  justification. 

To  check  this,  the  fitted  model  was  computed  with  the 
dispersion  curve  offset  by  amounts  I  ,  where  I  were  taken 
from  the  results  of  Experiment  1.  The  discrimination 
profiles  thus  generated  were  compared  with  profiles 
calculated  for  a  centered  dispersion  curve  and  shifted  by 
the  same  amount  1  Q.  The  differences  were  only  of  the  order 
of  a  few  percent,  which  is  much  less  than  the  scatter 
observed  in  the  experimental  data.  Conseguently ,  the  data 
can  be  pooled  in  this  fashion  with  only  a  first  order  loss 
in  accuracy. 

3.4.3  Summary  of  Discrimination  Besults 

The  STD  AX  discrimination  model  developed  in  Chapter  2 
was  fitted  to  the  discrimination  data  and  was  found  to 
adequately  characterize  the  experimental  data.  This 
provides  support  for  this  model  as  a  general  model  for 
categorical  perception,  and  demonstrates  that  the  relative 
intensity  continuum  under  investigation  is  categorically 
perceived.  The  phonetic  memory  AX  discrimination  model 
(Equation  2-5)  is  evidently  inadequate  to  account  for  the 
observed  discrimination  data. 


Fig.  3.23a  shows  the  dispersion  function  obtained  from 
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Fig.  3.23.  Dispersion  function  for  the  best-fit  model,  plotted 
in  (a)  as  a  function  of  I  and  in  (b)  as  a  function 
of  a.  In  (b),  3  is  showna  defined  as  the  width  of 
the  dispersion  function  at  half-height 
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the  fitted  model.  Fig.  3.23b  shows  the  dispersion  as  a 
function  of  the  control  parameter  a  -  The  measure  of 
categoricity  previously  suggested  (Equation  2-17)  based  on 
the  a  scale  rather  than  the  I  scale  (since  the  endpoint 
stimuli  are  an  infinite  distance  apart  on  this  scale)  is 

£  =  -j-  *  2.5  (3-16) 

Not  much  can  be  made  from  this  index,  of  course,  since  there 
are  no  comparable  indices  in  the  literature.  Nonetheless, 
by  comparison  with  other  categorical  perception  studies, 
this  continuum  appears  to  be  as  strongly  categorized  as 
typical  VOT  or  F2  continua,  and  thus  values  of  £  greater 
than  10  or  so  would  appear  to  typify  what  have  been  called 
categorically  perceived  continua. 

3.4.4  Nhat  is  Creating  the  Dispersion? 

Inasmuch  as  the  choice  of  a  Gaussian  function  for  the 
underlying  disperion  is  completely  arbitrary,  the  choice  is 
justifiable  on  the  grounds  that  it  is  unimodal  and  vanishes 

as  I  -*  +  oo  .  This  is  a  necessary  condition,  since  the 

a 

endpoint  stimuli  for  this  continuum  are  uniquely  defined  by 

a  =0  (I=-°°)  and  a  =  1  (I  =  ♦  00  )  -  The  dispersion 

'  a  a 

function  should  obviously  vanish  at  these  points.  At  least 
some  cf  the  differences  between  the  model  and  the  data  (Fig. 
3.20  above)  can  be  attributed  to  the  particular  choice  of 
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dispersion  function. 

The  origin  of  the  dispersion  is  at  present  unclear.  It 
appears  to  be  based  on  the  fact  that  the  signal  contains  two 
components  which  are  being  simultaneously,  plus  the  fact 
that  the  two  component  signals  are  shown  to  be  categorically 
perceived  on  the  F 2~ continuum.  The  close  parallel 
between  the  identification  and  discrimination  results  for 
/bae/  and  /dae/  on  this  relative  intensity  continuum  and 
these  obtained  using  and  continuum  suggest  that  some 

common  mechanism  is  involved.  One  possible  explanation  is 
that  some  interference,  perhaps  inhibitory,  exists  to 
sharpen  the  boundary  between  /b/  which  /d/,  and  would  lead 
to  an  intensity  variation  as  shown  in  Fig.  3.24  above.  In 
Chapter  5,  a  model  of  a  possible  inhibitory  interaction 
between  two  hypothetical  neural  populations  is  investigated, 
and  is  incorporated  into  the  SDT  model  of  Section  2.3.  This 
model  is  shown  to  possess  the  properties  necessary  to 
account  for  the  identification  and  discrimination  data 
presented  in  this  chapter.  First,  however,  two  more 
experiments  are  reported:  selective  adaptation  and  binaural 
fusion,  both  of  which  shed  light  on  the  nature  of  the 
interactions  in  the  perception  of  the  /b/-/d/  stimulus 


combinations. 


CHAPTEB  4 


SELECTIVE  ADAPTATION 


Erom  the  results  of  Sawush  and  Pisoni  (1976),  Miller 
(1S77)  and,  to  a  limited  extent,  from  Ainsworth  (1  977),  it 
can  he  inferred  that  the  amount  of  boundary  shift  under 
selective  adaptation  is  related  to  the  position  of  the 
adaptor  along  the  test  continuum.  As  the  adaptor  takes  on 
physical  values  intermediate  to  the  endpoint  stimuli  on  a 
two-stimulus  continuum,  the  boundary  shift  changes  from 
negative  at  one  end  of  the  continuum  to  positive  at  the 
other1,  and  is  zero  when  the  adaptor  is  a  boundary 


1  The  boundary  shift  is  defined  as  the  postadaptation 
boundary  minus  the  preadaptation  boundary.  One  direction 
can  arbitrarily  be  called  positive. 
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stimulus.  In  auditory  adaptation  studies,  the  change  in 
threshold  (temporary  threshold  shift)  with  adaptor  intensity 
is  well  known  (e. g. ,  Ward  et  al.,  1958) ,  and  recent  evidence 
suggests  that  phonetic  boundary  shifts  are  likewise  a 
function  of  the  adaptor  intensity  (Hillenbrand,  1975;  Simon, 
1977;  Miller  et  al. ,  1977;  Sawusch,  1977).  On  this  basis, 
selective  adaptation  to  a  /b/-/d/  combination  ought  to 
result  in  a  boundary  shift. 

If  the  /b/  component  of  the  composite  stimulus 
selectively  adapts  the  /b/  processor2  and  the  /d/  component 
selectively  adapts  the  /d/  processor,  then  the  boundary 
shift  should  be  regulated  by  the  relative  desensitization  of 
these  processors.  When  the  /b/  and  /d/  components  are 
perceptually  equal  (i.e.,  the  adapting  stimulus  is  a 
boundary  stimulus) ,  no  boundary  shift  is  to  be  expected 
since  the  /b/  and  /d/  processors  will  be  affected  equally. 

As  a  takes  on  values  greater  or  less  than  a50'  the 
adapting  stimulus  rapidly  acquires  a  unique  phonetic 
identity.  The  weaker  component  then  becomes  subliminal.  If 
the  boundary  shift  is  solely  a  function  of  the  phonetic 
category  of  the  adaptor,  it  follows  that  for  this  model,  the 


2  It  is  not  being  proposed  that  /b/  and  /d/  are  processed  as 
intact  spectro- temporal  patterns.  But,  since  the  total 
intensity  of  the  /b/  and  /d/  vary  with  a  ,  each  of  their 
acoustic  cues  -  whatever  they  may  be  -  vary  with  a  in  the 
same  fashion.  Thus,  within  the  context  of  the  present 
paradigm,  it  is  convenient  to  speak  of  a  "/b/  processor"  and 
a  "/d/  processor". 
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boundary  shift  should  be  a  function  of  a  only  in  the 
boundary  region  (see  the  solid  line  in  Fig.  4.1).  On  the 
other  hand,  if  selective  adaptation  is  selective  at  the 
auditory  level  rather  than  at  the  phonetic  level  (i.  e. ,  the 
amount  of  adaptation  depends  on  the  relative  intensities  of 
/b/  and  /d/)  ,  a  much  more  gradual  influence  of  adaptor 
composition  should  be  observed  (see  the  dashed  line  in  Fig. 
4.1). 

4.1  EXPERIMENT  8:  Selective  Adaptation 

The  apparatus  for  the  presentation  of  the  stimuli  was 
as  already  shown  in  Fig.  3.6.  The  presentation  program  was 
modified  to  play  back  an  adaptor  of  pre-determined 
composition  (a  )  for  three  minutes  at  a  presentation  rate 

3. 

of  approximately  2  per  second.  The  identification  test  of 
Experiment  1  was  conducted,  except  that  after  every  11 
stimulus  presentations,  a  reinforcing  adaptation  period  of 
75  pre sentations  was  conducted.  A  one-second  1000  Hz  tone 
signalled  the  commencement  of  the  reinforcement. 

Two  pre-adaptation  identification  runs  were  conducted 
prior  to  conducting  each  adaptation  run.  The  identification 
runs  lasted  approximately  eight  minutes  each,  and  the 
adaptation  run  lasted  approximately  35  minutes.  The  stimuli 
were  presented  monaurally  to  the  right  ear.  Three  subjects 
participated  in  this  study,  which  involved  adaptation  to  11 
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Fig.  4.1.  Hypothesized  boundary  shifts  for  phonetic-level 
adaptation  (solid  line)  and  auditory  adaptation 
(dashed  line) .  The  grey  region  represents 
approximately  the  boundary  zone 
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values  of  a  from  0  to  1  in  steps  of  Aa  =  0.  1.  The  value 

a  a 

of  a  was  chosen  at  random  for  any  particular  run.  A 
second  run  for  each  a  was  obtained  for  subject  DS. 

cl 

Tig.  4.2  shows  a  few  typical  pre-  and  post- adaptation 
runs.  The  boundary  shifts  were  computed  by  fitting  a  normal 
ogive  to  all  identification  curves.  The  axes  of  Pig.  4.3 
are  expressed  in  decibels,  where 


(4-1) 


were  I*  (the  shifted  boundary)  and  I  (the  adaptor)  are 

ou  a 

computed  according  to  Eguation  3-8.  I50  is  the  mean  pre¬ 

adaptation  boundary  for  that  run,  as  determined  from  the 
pre-adaptation  identification  tests.  Table  4-1  shows  the 
pre-  and  post- adaptation  boundaries  for  the  three  subjects. 

Subjects  GE  and  JH  demonstrate  very  similar  boundary 
shifts  over  the  entire  a  range.  Since  their  average  pre¬ 
adapted  boundaries  differed  by  only  1.2  dB,  their  data  were 
averaged . 3 


3  All  averages  were  performed  with  the  boundary  locations 
specified  in  decibels.  Thus,  the  means  are  harmonic  means. 
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Fig.  4.2.  Typical  pre-  and  post-adaptation  identification  curves 


Fig.  4.3.  Boundary  shift,  I  ,  as  a  function  of  adaptor 
composition 
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TABLE  4-1 


PRE-  AND  POST -ADAPT ATI  ON  BOUNDARIES 


JH 

GR 

DS 

ADAPTOR 

UNADAPT 

ADAPT 

UNAD APT 

ADAPT 

UNADAPT 

ADAPT 

1  5  0 

:;0 

1  5  0 

^0 

1  5  0 

^0 

(dB) 

(dB) 

(dB) 

(dB) 

(dB) 

(dB) 

(dB) 

-0.37 

-8.36 

0.17  - 

•8.32 

2.  45 

-0.45 

-19.0 

-0.34 

-5.91 

-0.13  - 

•5.35 

2.56 

-0.86 

-12.0 

-1.56 

-6.79 

0.40  - 

•4.60 

2.  72 

-1.47 

-7.4 

•*•0.59 

-3.53 

-0.03  - 

•2.42 

2.5  1 

-0.  19 

3.5 

-1.40 

-2.05 

0.34  - 

0.31 

3.  01 

-0.13 

0.0 

-0.87 

-0.  23 

0.12 

0-87 

2.7  2 

2.  37 

3.5 

-0.96 

2. 63 

-0.40 

1.71 

2.77 

4.26 

7.  4 

-0.  60 

3.39 

0-88 

4.  72 

2.83 

4.46 

12.0 

-0.  55 

3. 72 

0.13 

4.66 

2.  05 

5.13 

o 

* 

-  1.08 

3.  52 

0.00 

4.35 

3.  17 

5.  72 

-3.35 

1.98 

0.  15 

4.38 

2.66 

5.82 

The  data  for  subject  DS  were  not  included  in  the  averaging 
since  the  differences  in  boundary  shifts  between  subject  DS 
and  either  GR  or  JH  appear  to  be  of  the  order  of  a  factor  of 
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two.  4  Jig.  ii.3  shows  that  significant  boundary  shifts  are 
produced  for  non-boundary  adaptors.  This  suggests  that 
phonetic-level  adaptation  is  not  the  prime  determinant  of 
the  boundary  shift,  since  boundary  shifts  are  a  strong 
function  of  adaptor  composition  for  adaptors  up  to  10  dB  or 
so  from  the  normal  unadapted  boundary.  It  appears  that 
whatever  physiological  mechanism  is  reponsibile  for  the 
shift  it  is  sensitive  to  the  acoustic  form  of  the  signal  and 
not  just  its  phonetic  value. 

There  is  an  observation  to  be  made  concerning  the 
perceived  quality  of  the  stimuli  during  the  adaptation  run. 
Approximately  one-third  of  the  way  into  the  test,  both  the 
adaptor  and  the  test  stimuli  start  to  sound  "fuzzy"  or 
"rough",  as  if  noise  were  being  added  to  the  signal.  This 
is  characteristic  of  tonal  signals  in  under  auditory  fatigue 
(Hirsh  and  Kard,  1952;  Davis  et  al.  ,  1950)  but  so  far  has 
not  teen  noted  in  the  selective  adaptation  literature.  It 
was  consistent  for  all  runs,  and  noticed  by  all  three 
subjects.  In  general,  test  stimuli  which  were  of  the  same 
category  as  the  adaptor  tended  to  sound  noiser  than  those 
belonging  to  the  opposite  category.  In  spite  of  this  change 
of  stimulus  quality,  no  discernible  influence  on  the  slope 

4  This  was  also  observed  in  a  pilot  study  in  which  a 
complete  run  was  performed  on  subject  DS  and  a  few  test 
points  were  obtained  from  subject  JH.  The  boundary  shifts 
for  JH  were  a  factor  cf  two  greater  in  this  study  also. 
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of  the  identification  curve  is  observed  (see  Fig-  4.4)-  The 
first  point  on  Fig.  4.4  corresponded  to  an  identification 
curve  with  an  abnormally  large  slope,  and  this  point  also 
corresponds  to  an  abormally  large  boundary  shift  (see  Fig.  4 
.3).  Both  JH  and  GF  exhibited  large  boundary  shifts  and 
large  boundary  regions  for  this  particular  run.  Two 
replications  (see  Experiment  9)  of  this  point  by  subject  JH 
failed  to  reproduce  either  the  large  shift  or  large  slope, 
so  this  point  must  be  considered  suspect) - 

Sawusch  (1977)  and  Miller  (1975)  used  confidence 
ratings  to  measure  wi thin-categor y  changes  in  VOT  judgements 
due  to  adaptation,  and  their  results  show  that  the  quality 
rating  decreased  for  within-category  stimuli.  This  they 
interpreted  as  a  desensitization  of  the  detector  response 
functions  over  their  entire  domain.  Although  neither  Miller 
nor  Sawusch  indicate  the  nature  of  this  qualitative  change 
in  the  percepts,  it  is  possible  that  it  is  due  in  part  to 
increased  noisiness  of  the  signal  as  found  in  the  present 
experiment.  As  one  possible  explanation,  Sawusch  suggests 
that 


"...  the  within  category  effect  seems  to  be 
characteristic  of  adaptation  at  the  peripheral 
frequency  specific  auditory  level  and  a  fatigue 
interpretation  of  the  adaptation  at  this  level  seems 
to  be  appropriate-  "  (p.749). 


He  also  suggests  as  an  alternative  explanation  a  '‘retuning1' 


146 


Fig.  4.4.  Width  of  the  identification  boundary  (a)  as  a 
function  of  adaptor  composition 
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operation  at  the  central  level,  but  the  type  of  change  in 
signal  guality  observed  in  the  present  experiment  would 
argue  in  favour  of  the  first  interpretation. 

4.2  Summary 

Experiment  8  provides  information  on  how  the  /b/  and 
/d/  processors  interact  when  selectively  adapted,  and  it  is 
clear  that  there  are  wit hin-category  adaptation  effects. 

This  finding  is  consistent  with  the  results  of  Sawusch 
(1977)  and  Miller  (1975),  and  rules  out  the  possibility  that 
the  adaptation  is  occurring  at  a  level  where  only  the 
phonetic  identity  of  the  adapting  stimulus  is  preserved. 
Also,  it  was  noted  during  the  selective  adaptation  runs  that 
repeated  playback  of  an  adaptor  close  to  the  boundary  does 
nor  result  in  a  "f lip-f lopn  of  the  percept  between  the  two 
phonetic  categories,  as  may  occur  in  reversible  visual 
figures  (Taylor  and  Aldridge,  1974).  Rather,  the  adaptor  is 
perceived  as  a  slightly  noisy  exemplar  of  one  of  the 
categories,  and  this  identity  persists  for  the  entire 
adaptation  run.  While  it  is  possible  to  induce  the  opposite 
percept  for  a  stimulus  near  the  boundary,  there  always 
appears  to  be  a  preferred  percept  on  repeated  playback  (see 
Section  3.2.6).  In  this  light,  phonetic-level  adaptation  is 
ruled  cut  since  the  boundary  shifts  then  should  only  take  on 
two  discrete  values,  one  for  either  category  of  adaptor. 

Fig.  4.3  shows  that  this  is  not  the  case. 
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The  alternate  possibility  (which  will  be  explored  with 
a  mathematical  model  in  Chapter  5)  is  that  the  degree  of 
adaptation  is  strictly  determined  by  the  acoustic 
composition  of  the  adaptor.  If  the  /b/  and  /d/  components 
are  recognized  by  separate  processors,  neural  adaptation  at 
a  level  where  the  spectro-tempora 1  information  is  preserved 
should  result  in  a  reduced  input  input  to  each  processor. 
The  outputs  of  these  two  processors  ("detectors",  if  you 
will)  would  then  be  reduced  by  an  amount  which  depends  on 
the  ccirposition  of  the  adaptor,  and  a  boundary  shift  will 
ensue . 

In  order  to  pursue  this  line  of  analysis,  more 
information  on  the  response  characteristics  of  these 
hypothesized  processors  is  necessary-  In  particular,  it  is 
desirable  to  know  how  the  boundary  shift  depends  on  the 
intensity  of  a  single  (i.e.,  pure  /bae/  or  /dae/)  adaptor. 
This  information  is  necessary  to  decide  if  the  boundary 
shift  for  the  composite  stimulus  can  be  predicted  from  the 
effects  of  each  component  independently. 

4.3  EXPERIMENT  9:  Effect  of  Adaptor  Intensity 

Experiment  8  was  replicated  using  /dae/  adaptors  of 
various  intensities.  Two  of  the  original  three  subjects 
participated.  Experimental  sessions  were  conducted  every 
two  or  three  days  at  the  subjects*  convenience  (it  is 
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desirable  to  leave  at  least  24  hours  between  adaptation 
sessions  to  avoid  any  cumulative  effects)  .  The  adaptor 
intensities  were  chosen  in  the  range  0  dB  to  -17  dB  re  the 
full  /dae/  stimulus  intensity.  The  intensity  on  any 
particular  session  was  selected  at  random. 

The  results  (  Fig.  4.5)  show  that  the  boundary  shift 
increases  with  adaptor  intensity  as  expected.  Significant 
boundary  shifts  commence  when  the  /dae/  adaptor  intensity  is 
of  the  order  of  50  dB  SPL,  which  is  consistent  with 
threshold  shifts  for  exposure  to  narrow  band  noise  (Ward  et 
al.,  1958;  Trittipoe,  1  958).  The  boundary  shifts  for  /dae/ 
have  roughly  the  same  reproducibility  as  those  shown  in  Fig. 
4.3,  but  since  the  shifts  are  everywhere  smaller,  the 
uncertainty  is  proportionally  greater.  Several  points  were 
replicated  in  an  attempt  to  stabilize  the  boundary  shift 
estimates.  Fig.  4.6  shows  the  boundary  shifts  for  the 
composite  adaptors  and  the  /dae/  adaptors  plotted  on  the 
same  graph.  From  this  diagram,  it  can  be  seen  that  the  rate 
of  change  of  boundary  with  change  in  intensity  of  the  /dae/ 
adapter  is  greatest  for  low  adaptor  intensities.  Assuming 
that  the  effect  of  a  /bae/  adaptor  is  similar,  the  boundary 
shifts  predicted  on  the  basis  of  each  processor  being 
affected  independently  by  the  corresponding  /b/  or  /d/ 
component  can  be  represented  to  a  first  approximation  as  the 
difference  of  the  boundary  shifts  associated  with  the  /bae/ 
and  /dae/  adaptors  independently.  This  leads  to  a  curve  of 
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Fig.  4.5.  Boundary  shifts  as  a  function  of  the  intensity  of 
/dae/  adaptor 


Fig.  4.6.  Comparison  of  boundary  shifts  for  the  /dae/  adaptor 
(thick  lines)  and  composite  /bae/-/dae/  adaptor 
(thin  lines) 
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the  form  shown  in  Fig.. 4. 7,  and  has  a  curvature  exactly  the 
opposite  to  that  obtained  in  Fig.  4.3.  Inasmuch  as  this  is 
only  a  crude  estimate  of  the  effects  of  a  combined  /b/  and 
/d/  adaptor,  it  still  suggests  that  the  effect  of  the 
composite  adaptor  is  not  simply  related  to  the  effects  of 
either  signal  component  taken  independently.  The 
relationship  between  Experiments  8  and  S  will  be  considered 
again  in  Chapter  5  where  a  model  of  the  adaptation  process 
is  constructed. 
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Fig.  4.7.  Hypothesized  boundary  shifts  for  the  case  of  both  "pure" 
/bae/  and  /dae/  adaptors  and  composite  /bae/-/dae/ 
adaptor  (assuming  the  shift  due  to  the  composite 
adaptor  is  basically  equal  to  the  difference  between 
the  shifts  associated  with  the  "pure"  adaptors) 


CHAPTER  5 


MODELLING  THE  EXPERIMENTAL  RESULTS 


The  model  of  AX  discrimination  developed  in  Chapter  2 
required  the  specification  of  an  arbitrary  dispersion 
function  in  order  to  fit  the  experimental  results.  In  this 
chapter,  a  model  of  the  monaural  fusion  paradigm  is 
developed  which  leads  to  the  required  dispersion,  and  is 
extended  to  include  selective  adaptation.  The  motivation 
for  the  model  stems  from  the  consideration  that  whatever  the 
acoustic  cues  for  /bae/  and  /dae/  are,  they  must  vary  as  the 
relative  intensity  of  the  respective  signal  components.  The 
basic  assumption  is  that  each  signal  component  results  in  a 
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degree  of  excitation  in  a  separate  neural  population  which 
is  monotonic  with  the  intensity  of  the  stimulating  signal. 
These  populations  will  be  referred  to  as  "detectors". 

To  formulate  the  model,  a  number  of  assumptions  are 
necessary.  The  major  assumption  -  that  of  separate  /b/  and 
/d/  processors  -  is  the  most  difficult  to  substantiate. 
Support  for  this  assumption  comes  from  the  fact  that  a 
boundary  stimulus  can  be  perceived  as  a  member  of  either 
phonetic  category  -  thus  demonstrating  that  both  sets  of 
acoustic  cues  are  simultaneously  available  for  processing. 
The  informal  test  conducted  during  Experiment  1  to  test  how 
far  ths  boundary  could  be  moved  by  controlling  overt 
response  bias  (Fig.  3.11)  also  supports  this  assumption,  and 
shows  that  in  the  transition  region  the  acoustic  cues  for 
both  /bae/  and  /dae/  are  simultaneously  available. 

The  AX  discrimination  results  show  (by  virtue  of  the 
extracted  dispersion  function)  that  this  relative  intensity 
continuum  is  categorically  perceived.  The  fact  that  the 
dispersion  function  did  not  come  out  to  be  a  delta  function 
(which  is  hardly  surprising)  indicates  that  some  within- 
category  discrimination  is  possible,  and  that  the 
discr iminability  decreases  to  zero  as  either  of  the  two 
signal  components  vanishes.  The  indications  are  (re  Fig. 
3.22)  that  the  influence  of  the  weaker  signal  component  is 
detectable  to  about  ±15  dB  or  so  away  from  the  phonetic 
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boundary.  The  boundary  shifts  in  the  selective  adaptation 
experiment  (Experiment  8)  also  show  that  the  weaker  signal 
is  an  effective  adaptor  out  to  approximately  ±15  dB,  perhaps 
beyond.  Taken  together,  these  two  experiments  suggest  that 
their  respective  interpretations  do  not  depend  on  the 
phonetic  value  of  the  stimulus,  but  rather  on  its  acoustic 
structure. 

5- 1  A  PKE1IMINAEY  MODEL  OF  /b/-/d/  DETECTION 

The  first  attempt  at  a  model  will  assume  that  each 
component  of  the  composite  stimulus  is  detected  by  a 
separate  neural  population,  and  that  the  degree  of 
excitation  in  these  populations  scales  with  stimulus 
intensity  as  a  power  law  function,  as  for  the  growth  of 
loudness.  In  the  discussion  which  ensues,  it  is  important 
to  note  that  this  modelling  is  intended  only  as  functional, 
and  does  not  purport  to  have  any  anatomical  or  physiological 
correspondences.  To  commence  the  model,  let  Uj  and  u0  be 
variables  representing  the  degree  of  neural  excitation 
corresponding  to  the  /d/  and  /b/  components  of  the  stimulus 
respectively.  The  excitation  of  the  two  neural  ensembles 
can  then  be  expressed  as 

ui  -  kl2  (5-1a) 


up  =  kl2 


(5- 1 b) 
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where  I  and  I  are  the  intensities  of  the  /b/  and  /d/ 
components  respectively.  "k"  is  a  normalizing  constant 
whose  value  is  of  no  interest  here.  Using  Equations  3.4  to 
define  the  signal  energies* 1  of  the  /b/  and  /d/  components, 
the  excitation  can  be  expressed  as  a  function  of  a  : 


k  •  (  1  -  a  )  6 


(5  -2a) 


u 


2 


a 


20 


(5- 2b) 


where  k*  has  absorbed  both  k  and  the  absolute  conversion 
factor  between  I  and  a  2,  since  k*  merely  determines  the 
scale  for  u1  and  u2,  for  computational  purposes  it  can  be 
taken  as  unity. 

Tc  complete  this  model,  it  is  only  necessary  to  specify 
the  exponent  0  .  The  commonly  accepted  exponent  for  the 
growth  of  loudness  is  0  =  0.27  (Stevens,  1971;  Zwislocki, 

1969;  Luce,  1977).  Using  6  =  0.27,  and  Eguations  5-2,  u* 
and  u  *»  can  be  calculated  for  values  of  0  between  0  and  1. 
The  result  is  a  "stimulus  trajectory"  in  the  u-j-u9  plane 
(see  Fig.  5.1)  .  Assuming,  as  for  the  detector  model  in 
Section  2.3,  that  u-^  and  u2  are  the  means  of  two 

1  —»  ■■  «;■'  -1  ■  —  ■  -  '  —  ■ 

1  Since  it  is  not  clear  which  energies  are  being  monitored, 
the  total  energy  (Equations  3-4)  is  used  as  an  estimate  of 
the  stimulus  intensity.  Again,  since  the  intensity  of  any 
cue  varies  as  the  total  energy,  this  should  not  be  a 
significant  source  of  error. 


Fig.  5.1.  Stimulus  trajectory  for  6=0.27 


Fig.  5.2.  Dispersion  function  for  0=0.27 
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(uncorrelated)  random  variables  U1  and  U2  with  variance  a  , 
the  point  (u1#u2)  represents  the  centroid  of  a  bivariate 
circular  normal  probability  distribution  in  the  u1~u2 


plane.  Using  the  decision  variable  (U^-U^/  /2  ,  the 


probability  of  identifying  a  given  stimulus  characterized  by 
a  as,  say,  /b/  is 


u9  (°0  -  u1(a) 


PO)  =  (J)  — - 

^  /2  a 


(5-3) 


This  is  a  sigmoidal  curve,  and  thus  for  a  suitable  choice  of 
a  will  resemble  the  identification  functions  of 
Experiment  1.  The  identification  data  alone,  however,  are 
insufficient  to  test  the  model,  since  any  stimulus 
trajectory  which  crosses  the  decision  line  (u2=u1)  only  once 
will  lead  to  a  si gmoi d^shaped  identification  function.  A 
more  stringent  test  of  the  model  is  the  AX  discrimination 
data  of  Section  3.4.  From  Section  3.3.,  a  suitable 
dispersion  function  was  determined  to  be 


D(cO 


u1  (a)  u'2  (a)  -  u  j  (a)  ti  2  (a) 


(5-4) 


Using  u  and  u  defined  by  Eguaticns  5-2  above,  this  becomes 


■ 
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26  26-1 


+  (1-oQ 


26-1  26 


a 


(5-5) 


46 


+  a 


for  6  =  0.27,  the  dispersion  function  appears  as  shown  in 

fig.  5.2.  This  dispersion  function  shows  no  enhanced 
dispersion,  and  an  attempt  to  fit  this  model  to  the 
discri irination  results  would  be  fruitless. 

Equations  5-2  and  5-3  constitute  a  model  with  two 
parameters:  6  and  a  .  Now,  it  should  be  clear  from  Section 

3.2  that  if  the  dispersion  functicn  is  approximately 
constant,  changing  a  will  not  generate  enhanced 
dispersion.  The  only  possiblity  that  remains  is  to  allow 
to  take  on  different  values.  fig.  5.3  shows  various 
stimulus  trajectories  and  corresponding  dispersion  curves 
for  6  =  0.27  to  6  =  2.0.  It  can  be  seen  that  as 

becomes  large,  the  dispersion  function  does  indeed  become 
unimodal,  and  by  making  6  arbitrarily  large,  the  peak  of 
the  dispersion  function  can  be  made  arbitrarily  narrow.  It 
would  appear,  then,  that  in  order  to  make  this  model  fit  the 
discrimination  data,  it  is  necessary  to  let  6  =  2-  0  or 
greater.  However,  this  seems  unacceptable  since  such  large 
exponents  are  not  noted  for  the  growth  of  loudness  with 
stimulus  intensity.  As  will  be  shown  below,  inclusion  of 
some  fcrm  of  inhibition  between  these  hypothetical  neural 
populations  will  produce  the  required  dispersion.  In  order 
to  do  this,  however,  it  is  first  necessary  to  formulate  a 
more  detailed  model  of  the  excitation  process. 
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a 


Fig.  5.3.  (a)  strimulus  trajectories  and  (b)  dispersion 

functions  for  various  values  of  0 
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5.2  INTENSITY  CODING  BY  NEURAL  POPULATIONS 

Consider  a  population  consisting  of  Nq  hypothetical 
neurons.  For  the  sake  of  the  present  discussion,  these 
cells  are  assumed  to  be  capable  of  becoming  excited  (i. e., 
firing)  in  response  to  external  stimulation.  NQ  is  assumed 
sufficiently  large  that  the  mean  level  of  excitation  is 
insensitive  tc  minor  perturbations  by  individual  neurons  in 
the  population.  The  fundamental  assumptions  are: 

(a)  the  probability  of  firing  is  the  same  for  all 
cells  in  the  population  which  are  in  the 
"ground  state" 

(b)  the  probability  per  unit  time  of  a  cell  firing 
in  response  to  a  stimulus  of  intensity  I  is 
given  by 


Y  '  =  Y  f  (I) 


(5-6) 


where  f(I)  initially  will  be  assumed  to  be  a 

0 

power  law,  i.e.  ,  f(I)  =  I  - 


Now,  a  cell,  once  excited,  is  no  longer  capable  of 
firing  for  a  certain  time  (i.e.,  it  is  in  a  refractory 
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state) .  After  a  time  typically  of  the  order  of  1  or  2 
milliseconds,  the  cell  recovers,  and  the  probability  of 
recovery  per  unit  time  will  be  denoted  by  .  Zwislocki 

(1969),  for  a  neurally-based  model  of  temporal  summation, 
suggests  a  value  of  300  sec  ^for  y^  =  1/t^  ,  and  this 

value  will  be  assumed  here.  The  process  of  recovery  will  be 
referred  to  as  "de-excitation".  The  excitation/de-excita- 
tion  process  is  illustrated  sche matically  in  Fig.  5. U. 

The  rate  processes  shown  in  Fig.  5.4  lead  to  the 
differential  eguation 


dN 

dt 


Y  f ( I ) N  -y  N 

g  r 


(5-7) 


where  N  is  the  number  of  cells  in  the  ground  state  at  time 
t,  and  N  is  the  number  of  excited  cells.  Reguiring  that 
the  total  number  of  cells  in  the  population  remain  constant, 

i.  e.  , 


Nq  =  N  ♦  Ng 


(5-8) 


Eguaticn  5-7  becomes 


dN 

dt 


Yf (I)N0-(Yf (I)+Yr)N 


(5-9) 


As  long  as  I  is  invariant  with  time,  this  differential 
eguation  has  constant  coef ficients,  and,  assuming  the 
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Fig.  5.4.  Schematic  model  of  excitation/de-excitation 

processes  in  a  hypothetical  neural  population 
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initial  condition  N  (0)  =  0,  has  the  solution 


N  ( t ) 


yf (I)nq 
yf  (i) +yr 


1  -  e 


(yf  (I ) +y ^ ) t ' 


(5-10) 


Nq  can  be  arbitrarily  taken  as  unity,  in  which  case  N(t)then 
represents  the  proportion  of  excited  cells  in  the  ensemble. 

This  result  shows  that  a  population  of  cells  can  behave 
as  an  energy  integrator  with  a  time  constant  given  by 


1 

yf  (i) +yr 


(5-11) 


From  Equation  5-11,  it  follows  that  if  the  probability  of  a 
cell  firing  increases  with  stimulus  intensity,  the  time 
constant  of  the  integration  decreases  with  intensity.  Such 
a  reduction  in  time  constant  with  stimulus  intensity  has 
been  noted  many  times  in  auditory  and  visual  temporal 
summation  studies  (cf  Roufs,  1975;  Marks,  1972;  Stevens  and 
Hall,  1966;  Small,  Brandt  and  Cox,  1962;  Miller,  1948).  For 
times  t  <<  t  r  Equation  5-10  can  be  approximated  by 


N  ( t )  ^  NQ y  f  (I)t 


(5-12) 


which,  assuming  f  (I)  =  I  ,  is  the  familiar  "law"  of  temporal 

e 

summation  (i.e.,  for  constant  N  ,  I  t  =  constant).  For 
longer  times,  t  >>  t  ,  the  saturation  level  of  excitation  is 
just 


(5-13) 
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T  f ( 1 ) N  o  ^  jB 

Yf(I)+Yr  "  N0  J0  +  :0 


0  '  ^7 

where  IQ  =  —  .  This  particular  response  function  has 

enjoyed  a  great  deal  of  popularity  in  the  vision  literature 
(e.g.  ,  Mansfield,  1976  ;  Marks,  1972,  1974;  Alpern,  1971)  and 
has  also  been  investigated  as  a  model  of  the  transfer 
function  of  sensory  transducers  (Lipetz,  1971). 


A  A 

For  intensities  I  such  that  I  <<  I  ,  Equation  5-13 
reduces  to  the  familiar  psychophysical  power  law: 


N°°  'b  I_^_  (5-14) 

ie 

0 

Thus,  for  intensities  low  enough  that  saturation  does  not 
occur  in  the  neural  mass,  this  model  is  linear  with  respect 

g 

to  the  driving  function  f(I)  (I  in  this  case).  Any  non¬ 
linearity  with  respect  to  I  comes  from  the  driving 
function.  Therefore,  in  order  to  obtain  power  law  behaviour 
with  this  model,  it  is  necessary  to  supply  it  as  f (I)  . 

Equation  5-11  ,  since  it  yields  a  power  law  behaviour 
for  non-saturating  intensities,  is  not  an  improvement  over 
Equations  5-1-  Admittedly,  this  solution  provides  the  time 
dependence  of  the  excitation  process,  but  only  the  steady- 
state  solution  (5-13)  is  of  interest  here-  Although  it  is 
hardly  realistic  to  assume  that  I  =  constant,  since  it  is 
not  known  exactly  which  energies  in  the  /b/-/d/  stimuli  are 
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being  monitored,  there  is  little  point  in  dwelling  on  the 
temporal  behaviour  of  N  .  Also,  from  a  practical 
standpoint,  if  stimulus  intensity  were  net  considered 
independent  of  time.  Equation  5-9  may  or  may  not  have  an 
analytic  solution.  In  any  event,  this  fcrmulation  of  the 
model  is  only  an  intermediate  step  to  enable  the  inclusion 
of  mutual  inhibition  between  the  two  neural  populations. 

5.3  I NHI  BI 1I0N _  E El WEEN  NEUBAL  P0PUIA1I0NS 

Neural  models  which  incorporate  inhibitory  processes 
demonstrate  behaviour  quite  different  frem  models  which 
incorporate  only  excitation  processes  (Wilson  and  Cowan, 
1972).  The  Wilson  and  Cowan  model  differs  somewhat  in  its 
development  from  the  one  given  here,  but  is  based  on  similar 
reasoning  and  premises,  and  demonstrates  similar  behaviour. 

Consider  the  two  neural  populations  shown  schematically 
in  Fig.  5.5.  Suppose  that  the  degree  of  activation  of  the 
inhibitory  connections  is  some  function  of  the  excitation 
(N 2  )  in  the  parent  population  (i.e.,  the  population  doing 
the  inhibiting).  The  probability  per  unit  time  that  a  cell 
in  population  2  will  be  inhibited  can  then  then  be  expressed 
as  Yi2  h(N  1)  ,  where  N1  is  the  level  of  excitation  in  the 
first  population,  and  h  is  some  as  yet  unspecified 
function.  The  excitation  in  the  two  masses  is  then  governed 
by  the  following  four  differential  equations 
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Nei  Ne2 


Fig.  5.5.  Schematic  configuration  of  two  mutually  inhibiting 
neural  populations 


Fig.  5.6.  State  diagram  of  two  mutually  inhibiting  neural 
populations 
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dN 


(5-  1  5a) 


dN  . 


il 


(5-15b) 


dt 


dN 


2  =  Tfd2)Ng2-TrN2 


(5^1  5c) 


dt 


dN 


(5-15d) 


dt 


These  equations  can  be  represented  by  the  state-level 
diagrair  shown  in  Fig-  5.6.  In  this  model,  a  neuron  can 
either  be  excited,  inhibited,  or  in  the  ground  state,  and 
furthermore,  can  only  be  in  one  of  these  states  at  one 
time-  Equations  5-15  are  typical  of  Volterr a-st yle 
population  models,  and  techniques  for  investigating  their 
possible  solutions  and  stability  points  have  long  been  a 
topic  of  interest  to  mathematicians  and  biologists.  In  the 
present  case,  no  direct  significance  can  be  attached  to  the 
time  dependent  solutions,  so  only  the  steady-state  solution 
will  be  considered.  This  is  obtained  by  setting  the  above 
derivatives  tc  zero,  i.e.. 


(5-  16a) 


<5-1 6  b) 


(It  is  necessary  only  to  show  two  of  the  equations;  the  two 
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equations  for  the  other  population  can  be  obtained  by 
interchanging  subscripts  1  and  2).  Substituting  N  1=  NQ1- 
Ni  “  Nn  Equations  5-16a  and  5-16b,  these  become 


Yf(Il)Nor(yf(Il)+Yr)Nl  "  Yf^i^Nii  =  0  (5- 1  7a) 

Yilh(N2)NorYilh(N2)Nr(Yilh(N2)+Yir)Nil  =  °  (5- 17b) 

Letting  r  =  y/y  and  r.,  =y.1/y.  Equations  5-17  can  be 

r  il  i 1  ir 

rewritten  to  yield 


rf(I1)N01-(rf(I1)  +  l)N1-rf(I1)N.1  =  0  (5-  18a) 

rilh(N2)Norrilh(N2)Nr(rilh(N2)+1)Nil  =  °  (5-1  8b) 


These  two  equations  (and  the  two  obtained  by  reversing  the 
subscripts)  cannot  be  solved  explicitly  for  N  ^  and  N2  *  but 
can  be  condensed  to  the  following  implicit  relations: 


rffl  IN 

XT  _  1  1J  01 
^1 

rf(I1)+l+r.1h(N2) 

r£(I?)Nn? 

N  =  - - - 

rf(I2)+l+r.1h(N1) 

Note  that  when  r  ^  ^  =  r^2  =  0 


(5-"  19a) 

(5  —  1 9  b) 

(i.e.,  no  inhibition),  these 


reduce  to  the  steady  state  solution  found  previously  for  the 
non-inhibitory  case  (Eguation  5-13). 
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N i  and  N2  can  only  be  calculated  from  Equations  5-19 
if  the  function  h (N  )  is  specified.  There  is  no  obvious 
choice  for  h,  so  the  simplest  solution  i s  to  choose  a  class 
of  functions  which  includes  the  trivial  function  h (n  )  =  N 

as  one  of  its  members.  A  suitable  general  function  (which 
has  no  particular  significance  except  that  it  is  monotonic) 
is 


h (N)  =  Nn 


(5-20) 


Q 

Using  f(I)=I  together  with  1=  a2,  N 2  and  N9  are  then 
defined  by 


N 


1 


N 


2 


f ,  ,20 

r ( 1 -a ) _ 

/-i  >  20  ,  ,.n 

r(l-a)  +l+r.1N0 

1 1  2 

20 

r  a _ 

2  0  x ,  n 

ra  +l+r._N1 
1 2  1 


(5-2 1  a) 


(5-2 1b) 


For  n=  1 ,  these  equations  can  be  solved  explicitly  for 
and  N  2  $  but  not  for  powers  greater  than  1.  However,  the 
solution  for  N.^  and  N2  can  be  iteratively  computed  using 
Newton's  method  (e.g.,  Conte,  1964,  p.43). 

Eig.  5.7a  shows  typical  isoclines  for  n=1  and  n=2  for 
a  =  0.4,  and  Eig.  5.7b  shows  the  corresponding  stimulus 
trajectories.  The  n= 1  model  shows  a  stimulus  trajectory 
reminiscent  of  that  produced  for  large  values  of  0  in  the 
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Ne2 


(a) 


Ne2 


(b) 


Fig.  5.7.  (a)  isoclines  for  a=0.4  for  n=l  (thin  lines)  and  n=2 

(thick  lines).  (b)  stimulus  trajectories  for  the 
n=l  (triangles)  and  n=2  (circles)  models 
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non-inhibit ory  model  (Fig-  5.3b) ,  and  is  undesirable  for  the 
same  reasons.  The  n=2  model,  on  the  other  hand,  shows 
genuine  dispersion  in  the  -N2  plane.  The  reason  for  the 
dispersive  power  of  the  n=2  model  can  be  seen  by  considering 
how  N(a)grows  with  a  for  the  two  populations.  These  curves 
are  shewn  in  Fig.  5.8a  for  n=1  and  in  Fig-  5.9a  for  n=2. 
and  the  corresponding  dispersion  curves  are  shown  in  Fig-  5 
.8b.  The  n=2  model  achieves  its  dispersion  by  sharpening 
the  difference  between  the  excitation  levels  N1  and  N2 
(evidenced  by  the  inflection  in  the  N  vs.  a  curves) . 

The  dispersion  curves  shown  in  Fig.  5.8b  show  an 
undesirable  property:  the  dispersion  becomes  infinite  for 

g 

a  =0  and  a  =1.  This  occurs  because  f(I)  =  I  has  an 
infinite  slope  at  I  =  0  when  0  is  less  than  unity.  This 
will  create  problems  for  the  model  in  trying  to  accommodate 
the  data  for  small  or  large  values  of  a  ,  and  a  correction 
factor  may  be  required  to  correct  this  defect-  A  possible 
correction  is  presented  later. 

This  is  the  basic  model  for  identification  and 
discr iirination.  A  given  stimulus  of  composition  specified 
by  a  results  in  two  levels  of  excitation,  and  N9  .  All 

judgements  concerning  the  categorization  and  discrimination 
presumably  are  transformations  of  these  two  variables. 

Using  the  mechanics  already  established  in  Section  3-3  for 
dispersion  of  a  two-detector  configuration,  the  application 
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a 


a 


Fig.  5.8.  (a)  excitation  curves  N  (a)  and  (a)  for  the  n=l 

model.  The  curves  differ  by  values  of  r.,  the  degree 
of  mutual  inhibition.  (b)  dispersion  curves  corresponding 
to  the  excitation  curves  shown  in  (a) 
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Fig.  5.9.  (a)  excitation  curves  (a)  and  N  (a)  for  the  n=2 

model.  The  curves  differ  by  values  of  r.,  the  degree 
of  mutual  inhibition.  (b)  dispersion  curves  corresponding 
to  the  excitation  curves  in  (a) 


175 


of  the  model  to  the  discrimination  data  can  now  proceed. 

5.4  PITTING  THE  A  X  DISCRIMI  NTI  ON  DMA 

Tc  complete  the  AX  discrimination  model/  the  excitation 
levels  N  ^  and  N?  are  identified  with  the  former  variables 
u1  and  \i  2  respectively  (i.e..  Equations  5-2)  .  Since  N1  and 
N  2  are  calculated  from  determinist ic  differential 
equations,  the  randomness  must  be  superimposed  afterwards. 
That  is,  N1  and  are  assumed  to  be  the  means  of  two 

random  variables  with  equal  variance  a  .  This  assumption  of 
constant  variance  may  be  unjustified  (Luce,  1977),  but  a 
specific  functional  relationship  for  a  with  intensity  is 
lacking. 2  Any  simple  monotonic  dependence  of  a  on  a  will 
probably  be  obscured  by  the  flexibility  of  the  model,  so  as 
a  first  approximation,  a  will  be  assumed  constant.  In  any 
event,  due  to  the  complexity  of  the  model,  simplifying 
assumptions  are  desirable  until  the  model  has  been 
adequately  investigated. 

It  was  pointed  out  earlier  that  a  plausible  value  for 
is  300  sec  Now,  the  quantity  Yf(I)  is  also  a 


2  Durlach  and  Braida  (1969),  in  a  model  for  intensity 
detection  and  discrimination,  make  the  assumption  that  the 
variance  is  independent  of  intensity.  For  simplicity, 
essentially  the  same  assumption  is  made  here,  except  that 
the  variance  is  assumed  to  be  independent  of  the  level  of 
excitation  (which  in  this  model  is  the  neural  counterpart  of 
inte  nsity) . 


probability  per  unit  time, 
value.  A  nominal  value  of 


and  ought  to  have  a  similar 
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r 


y/y 


r 


5  is  chosen. 


although  it  turns  out  that  the  the  behaviour  of  the  model  is 
not  particularly  sensitive  to  the  specific  value  of  r  .  To 
reduce  the  number  of  free  parameters  in  the  model,  r  is 
left  set  at  this  nominal  value. 

0  is  a  parameter  whose  value  is  equally  hard  to 
establish  since  it  depends  on  the  experi iient al  ci rc umstances 
as  well  as  the  particular  stimuli  used.  Because  of  this,  it 
will  be  allowed  to  float  as  a  free  parameter  in  the  fitting 
of  the  model,  at  least  to  obtain  a  representative  value.  It 
ought  to  be  reminiscent  of  the  difference  limen  for 
intensity,  i.e.,  perhaps  1  dB  or  so  (which  translates  on  the 
scale  to  approximately  0.06).  Given  quantization  noise, 
amplifier  noise  and  recording  noise,  it  is  conceivable  that 
it  could  be  even  larger. 

Iguations  5-22  now  contain  only  two  free  parameters, 
o  and  r  ^  .  (for  the  present  it  will  be  assumed  that  r  ^  = 
ri2  =  ri) •  following  the  procedure  outlined  previously  in 
Section  3.3,  the  AX  discrimination  scores  were  fitted  by 
calculating  the  decision  variable 


(5-22) 
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(y  is  the  mapping  corresponding  to  the  dispersion  function 
given  by  Equation  5-4,  and  represents  the  angular  distance 
of  the  point  (Nj  ,  N2  )  from  the  decision  line  shown  in  Fig. 
2-13).  An  adaptive  least  squares  fit  was  performed  to  the 
data  shown  in  Fig.  3.20,  and  representative  results  are 
shown  in  Fig.  5.10.  The  fit  is  observed  to  be  similar  to 
that  of  the  Gaussian  dispersion  function  fitted  in  Section 
3.4.1,  which  is  to  be  expected  since  both  dispersion 
functions  have  similar  shapes.  The  influence  of  the 
enhanced  dispersion  for  small  and  large  values  of  is 
evident  in  Figs.  5.10b  and  5.10c.  The  failure  of  the 
dispersion  curve  to  asymptote  to  zero  at  the  ends  of  the 
scale  causes  the  discrimination  curve  in  Fig.  5.10c  to 
increase  rapidly  away  from  a  =  1 .  This  is  more  pronounced 
for  the  n=2  model  than  for  the  n  =  1  model. 

The  extracted  values  of  r^  for  the  n=1  and  n=2  models 
are  186  and  25  respectively.  The  standard  deviation  of  the 
noise  distribution  in  both  cases  stabilizes  around  0. 1 
±0.05.  This  value  of  a  is  equivalent  to  a  noise  width  of 
approximately  3  dB,  which  is  surprisingly  large  since  it  is 
of  the  order  of  the  width  of  the  identification  function 
itself.  Part  of  the  reason  for  this  large  value  is  that  the 
model  does  not  follow  the  trend  of  the  data  perfectly. 
Consequently,  in  the  least  squares  fitting  process,  the 
parameters  adopt  whichever  values  are  necessary  to  minimize 
the  overall  sum  of  squares  difference,  and  this  can  be 
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Fig.  5.10.  Comparisons  of  n=l  (solid  lines)  and  n=2  (dashed  lines) 
models  for  AX  discrimination  data 
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accomplished  in  many  ways.  Also,  since  each  AX 

discrimination  profile  is  fitted  with  its  own  bias  factor 

(see  Section  3.4) ,  there  is  not  necessarily  a  unique  minimum 

sum  of  squares.  More  importantly,  however,  there  is  a 

tradeoff  between  o  and  The  effective  dispersion 

function  can  be  rouqhly  approximated  by  a  convolution  of  the 

noise  distribution  with  the  actual  dispersion  function,  and 

therefore  an  increase  in  a  can  be  approximately  accounted 

for  by  an  increase  in  i\  .  This  tradeoff  prevents  accurate 

determination  of  these  two  parameters.  Nonetheless,  the 

values  of  o  and  r.  quoted  above  will  be  used  in  the 

1 

selective  adaptation  model  derived  below,  since  their 
combined  effect  produces  the  required  dispersion. 

The  n=1  model  provides  a  superior  fit  for  the  AX 
discrimination  data,  mostly  due  to  the  more  acceptable  low 
intensity  behaviour  of  the  dispersion  function  (compare  Fig. 
5.8b  with  Fig.  5.9b).  In  the  following  section,  the  above 
model  cf  the  /b/-/d/  recognition  paradigm  is  extended  to 
include  the  effects  of  the  selective  adaptation  experiments 
(Experiments  8  and  9),  and  it  will  be  seen  at  that  time  that 
the  n=1  model  cannot  account  for  these  data.  But,  it  must 
be  remembered  that  the  choice  of  inhibition  function 
(Equation  5-21)  is  arbitrary,  so  selecting  between  these  two 
models  on  the  basis  of  superiority  of  fit  does  not  verify 
either  choice  of  model.  The  present  interest  is  only  in 
demonstrating  the  basic  functional  form  of  the  model,  since 
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this  model  is  only  one  member  of  a  class  of  models. 

5.5  MO  CELL  IN  G  SELECTIVE  ADAPATION 

- - ——————— — — - - - 

Auditory  threshold  shifts  due  to  continued  exposure  to 
tones  or  noises  have  been  investigated  since  Hood  (1950) . 
Although  it  has  never  been  decisively  established  that 
adaptation/fatigue  results  in  a  loudness  decrement  per  se 
(cf  Petty,  Eraser  and  Elliott,  1970) ,  this  is  one  of  the 
common  explanations  (Small,  1963;  Hood,  1950).  Whatever  the 
origin  of  adaptation/fatigue,  certain  regularities  are 
observed.  Eirst,  the  amount  of  threshold  shift  increases 
with  the  duration  of  the  adaptor  (Ward,  Glorig  and  Sklar, 
1958,  1959).  Second,  it  increases  with  the  intensity  of  the 

adaptor,  at  least  for  adapting  intensities  up  to  110  dB  SPL 
or  so  (Trittipoe,  1958;  Ward  et  al.  ,  1958;  Selters,  1964). 
Third,  recovery  is  more  or  less  exponential,  with  several 
components  with  different  time  constants  being  identifiable 
(Botsford,  1968;  Hirsh  and  Ward,  1952).  Several  of  these 
components  have  short  time  constants,  possibly  representing 
some  form  of  neural  adaptation  or  renewable  metabolic 
processes.  One  of  the  components  has  a  time  constant  of  the 
order  cf  hours,  perhaps  representing  auditory  fatigue 
(Botsford,  1968). 

Selective  adaptation  studies  use  a  similar  experimental 
paradigm  but  use  "phonetic  boundary  shift"  as  the  measured 
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variable.  The  phonetic  boundary  shifts  in  selective 
adaptation  studies  are  observed  to  (a)  increase  with  adaptor 
intensity  (Sawusch,  1977;  Miller,  Eimas  and  Foot,  1977; 
Experiment  9,  this  thesis) ,  and  (b)  increase  with  number  of 
adaptor  presentations  (Simon  and  Studdert- Kennedy ,  1978) . 

furthermore,  the  phonetic  boundary  evidently  returns  to 
normal  or  near-normal  within  a  few  hours  or  so,  which 
indicates  that  the  adaptation  effects  are  due  to  renewable 
physiological  processes-  While  there  is  little  agreement  in 
either  experimental  domain  concerning  the  locus  or  origin  of 
adaptation/fatigue,  the  effects  themselves  are  nonetheless 
real  and  also  fairly  reproducible. 

While  it  is  not  likely  to  be  the  case  that  all  of  the 
US  or  phonetic  boundary  shift  is  due  to  neural  adaptation, 
it  seems  reasonable  to  suppose  that  some  of  it  is, 
especially  for  low  intensity  adaptors.  In  any  event,  this 
assumption  will  be  made  for  the  present  modelling  purposes 
since  it  allows  a  straightforward  inclusion  of  the  effects 
of  neural  adaptation  into  the  model  derived  in  the  previous 
section.  To  incoporate  adaptation  as  an  aspect  of  the 
excitation  of  a  neural  population,  an  "adaptation  level"  is 
added  to  the  state  diagram  of  Fig.  5.6.  The  state-level 
diagram  then  appears  as  shown  in  Fig.  5.11.  This  new  level 
represents  a  level  at  which  cells  may  "collect",  thus 
removing  them  from  possible  further  excitation.  The  return 
to  the  ground  state  provides  for  the  observation  that 
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Fig.  5.11.  State  diagram  including  both  inhibition  and 
adaptation  (symbolized  by  level  N  ) 
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adaptation  effects  are  temporary  and  wear  off  with  the 
passage  of  time.  Within  the  context  of  this  model,  when 
st imul uation  ceases,  all  adapted  states  will  eventually 
return  to  the  ground  state  at  a  rate  determined  by  Yar. 
Since  the  data  to  be  analyzed  do  not  involve  the  temporal 
course  of  entry  into  or  recovery  from  adaptation/fatigue, 
only  ore  component  (represented  by  the  single  level  in  Fig. 
5.11)  will  be  considered.  A  more  general  model  would 
include  perhaps  four  levels,  each  representing  a  different 
time  constant,  but  for  the  present  purposes,  only  one  will 
be  considered. 


This  appears  to  be  a  reasonable  first  approximation  to 
the  adaptation  process.  When  stimulation  regins,  the  number 
of  cells  in  the  adapted  state  will  be  nil,  but  as 
stimulation  continues,  the  number  of  "adapted  cells"  will 
increase.  For  prolonged  or  intense  stimulation,  the  number 
of  fatigued  cells  may  become  a  significant  fraction  of  the 
total  number  of  cells  in  the  population.  Beause  of  the 
return  path  to  the  ground  state,  the  number  of  adapted  cells 
will  stabilize  for  a  given  intensity  of  adapting 
stimulation.  This  is  consistent  with  experimental  data 
since  changes  of  threshold  with  adaptation  and  recovery  are 
approximately  exponential  (e.g.,  Wright,  1959;  Keeler, 

1968)  .  Evidence  for  the  existence  of  simultaneous  but 
independent  fatigue  and  recovery  processes  is  provided  by 
Ward,  Glorig  and  Selters  (1960).  In  this  experiment. 
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subjects  were  first  subjected  to  105  dB  SPL  octave-band 
noise  (1200  -  2400  Hz)  for  thirty  minutes,  followed  by 
exposure  to  the  same  noise  at  95  dB  SPL.  ihe  TTS  curves 
show  that  following  the  reduction  in  the  intensity  of  the 
fatiguing  signal,  a  decrease  in  ITS  occurs,  followed  by  a 
gradual  increase  (see  Ward  et  al. ,  1960,  Fig.  2)  .  They 
conclude  that  this  cannot  be  explained  in  terms  of  rate 
processes  (charging  and  discharging  of  a  capacitor  in  their 
analogy)  ,  but  the  Keeler  (1968)  analysis  based  on  two  time 
constants  rather  than  one  shows  that  this  behaviour  is 
indeed  possible  with  such  a  model.  In  all  liklihood,  if  the 
phenomenon  of  TTS  is  characterized  by  rate  processes,  the 
rates  must  be  intensity-dependent. 

Ihe  adaptation  model  is  derived  by  generalizing  rate 
equations  5-15: 


dN 

“dt" 


yf  (i1)N, 


-y  N  -y  N 
1  r  1  a  1 


dN  . 

a  1 

d  t 


TaVT 


N  , 
ar  a  1 


dN  .  , 
1 1 

dt 


y  .  ,  h (N „ ) N  -Y  .  N.  . 
'll  v  2 '  g 1  it  ll 


(5  -23  a) 

(5-23b) 

(5-23c) 


Using  the  constraint  that  Ng  =  Ngl+  N  +  Nal+  Nil'  E9uatiOIls 
5-24  become  (on  setting  the  derivatives  to  zero  for  the 
steady  state)  : 
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Yf(I1)N01-(Y£(I1)nr+Ya)N1-Yf(I1)Nal  =  0  (5-24a) 


y  N  -y  N  ..  =  0 
a  1  ar  al 


( 5  —  2  4  b) 


Yxlh(N2)Nor'Tilh(N2)Nr(Yilh(N2)+Yir)Nil  =  0  <5~24c> 


Dividicg  Eguation  5-25a  by  y  ,  5-25b  by  y  and  5-25c  byy.  , 

10  3.10  1 T 


these  equations  become 


rf(Il)Nor(rf(Il)+1+ra)Nrrf(Il)Nal  =  0 


(5-25a) 


r  N  -N  =  0 
ar  1  al 


( 5-2  5b) 


rilh(N2)Norrilh(N2)N2'(rilh(:N2)+1)Nil  =  °  (5’25c) 


where  r  =  y  /y  and  r  =  y  /y  -  Coef  f  icients  r  and  r. ,  are 

a  a  r  ar  a  ar  i  1 

as  defined  earlier-  These  are  three  linear  equations  in  N1# 
Naland  Eliminating  N&1  using  Eguation  5-25 b,  i.e.. 


N  .  =  r  N . 
a  1  ar  1 


(5-26) 


the  solution  for  is  found  to  be 


Ni  = 


rf<VN01 


rf(I..)(l+r  )  +  (l+r  )(r.  h(N  )  +  l) 

1  ar  v  a  ll  2 


(5-27) 


Similarly,  the  solution  for  the  other  neural  population  is 
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n2  - 


rf(I2)N02 


rf (I2) (l+rar)+(l+ra) O^MNp+l) 


(5-28) 


which  is  obtained  from  Equation  5-27  by  reversing  the 
subscripts.  Using  h(N)  =  Nn  as  before,  and  N?  can  be 
calculated  by  solving  Equations  5-27  and  5-28.  Note  that 
when  y _  =  0  and  yo  =  0,  this  solution  reduces  to  that  found 
previously  for  the  no- adaptation  case  (Equations  5-19). 


5.5.1  Comparison  with  Keelerfs  (1  968)  Model 

Keeler  (  1968)  attempted  to  model  the  increase  of  IIS 
with  noise  exposure  using  a  lumped-parameter  circuit  model. 
His  general  model  consists  of  two  exponentials: 

t_  _  _t_ 

TTS  =  TTS^  (l-ke  Tl  -(l-k)e  t2)  (5-29) 

The  fit  of  this  model  to  the  Ward  et  al.  (1958)  auditory 
adaptation  data  show  that  for  the  fatigue  stage,  =  5 
minutes  and  x2  =  47  minutes.  Similarly,  for  the  recovery 
stage,  =  11.1  minutes  and  x2  =  250  minutes.  Keeler's 
model  can  be  derived  quite  simply  from  Eguations  5-24  if  (a) 
the  inhibition  terms  are  omitted,  and  (b)  two  adaptation 
levels  are  provided.  The  resulting  rate  eguations  in  this 
case  are: 


dN 

dt 


yf(I  )N  -y  N-y  N-y  N 
2J  g  r  al  a2 


(5-30a) 
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dN 


a  1 


(5- 30  b) 


d  t 


dN 


a  2 


(5  -30c) 


d  t 


The  solution  for  N  is  readily  found  using  Laplace 
transform  techniques  (e. g.,  McCollum  and  Brown,  1965)  and  is 
of  the  form 


t 


t 


t 


(5-31) 


N(t)=Noo+  Ae  T1  +  Be  T2  +  Ce  T3 


One  of  the  exponential  terms  corresponds  to  the 
excitation/ae-excitatior  process  and  can  be  neglected  when 
times  of  the  order  of  minutes  are  being  discussed.  The 
remaining  two  decaying  exponentials  are  identified  as  the 
two  exponentials  of  Keeler's  equation.  Thus,  Keeler* s 
equation  is  basically  a  solution  to  a  special  case  of  the 
present  model,  and  his  estimates  of  the  time  constants, 
although  derived  for  TTS,  will  be  used  as  estimates  of  the 
time  constants  for  selective  adaptation.3  In  fact,  one 


3  Since  auditory  adaptation/fatigue  is  a  well  known 
environmental  hazard  and  a  precursor  to  various  types  of 
hearing  disorders,  it  is  reasonable  to  suppose  that  at  least 
some  of  the  same  physiological  effects  occur  under  repeated 
presentation  of  speech  sounds.  Whether  or  not  other  (e. g., 
phonetic)  effects  also  exist  is  a  moot  point,  and  there  are 
no  data  which  conclusively  decide  either  way.  Since  the 
general  feeling  is  that  adaptation  is  an  auditory  rather 
than  phonetic  effect,  it  seems  reasonable  at  this  point  to 
assume  that  the  same  physiological  mechanisms  are  involved 
as  in  ITS,  and  that  the  same  time  constants  apply. 
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parameter  car  be  eliminated  from  Equation  5-27  using 
Keeler's  estimates  of  the  time  constants.  Since,  in 

Equation  5-27#r^  =  ya/yrand  yr  is  expected  to  be  of  the 

-1  -6 
order  of  300  sec  ,  ra  should  be  of  the  order  of  1 0 

Equation  5-27  involves  only  the  sum  1  +  ra,  so  r  can  be 

omitted,  in  which  case 


N 


rfUpNoi 


rf  (I  )  (1+r  )+r.  h(N  ) 

1  ar  i  1  2 


(5-32) 


A  similar  result  holds  for  Equation  5-28. 


5.5.2  Modelling  the  Selective  Adaptation  Paradigm 

Equations  5-27  and  5-28  yield  the  values  of  N  ^  and  N  ? 
in  response  to  sustained  exposure  to  a  stimulus  of  constant 
intensity.  Since  these  are  the  steady  state  solutions,  they 
do  not  contain  time  as  an  explicit  parameter,  and  no 
statements  can  be  made  concerning  the  temporal  development 
of  (or  recovery  from)  adaptation.  Thus,  to  model  the 
results  of  Experiments  8  and  9,  it  is  necessary  to  assume 
that  "complete"  adaptation  has  occurred,  i.e.,  that  the 
boundary  shifts  have  stabilized.  Considering  the  number  of 
repetitions  of  the  adaptors  and  the  results  of  auditory 
adaptation  studies,  this  assumption  should  not  be  major 
source  of  error. 


A  further  assumption  which  must  be  made  is  that  the 
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amount  of  adaptation  incurred  by  a  single  presentation  of  a 
test  stimulus  is  negligible  compared  to  that  caused  by  the 
presentation  of  the  adaptor.  Since  the  time  constants  of 
recovery  from  adaptation/fatigue  are  large  compared  to  the 
duration  of  a  single  stimulus,  after  adaptation  the  number 
of  cells  which  can  be  potentially  excited  in  response  to  a 
stimulus  is 


N 


01 


( 5-33a) 


where  Nalis  given  by  Eguation  5-26.  (The  number  of  excited 
states  remaining  will  be  assumed  nil  since  the  time  constant 
for  recovery  from  the  excited  state  is  expected  to  be  very 
short).  Using  Eguation  5-26,  the  effective  number  of  cells 
available  for  excitation  is 


N01(Ia5  ' 


Nnl -r  N  (I  ) 
01  ar  1  a 


(5-34) 


where  (Ia)  is  the  steady  state  level  of  excitation  due  to 
prolonged  presentation  of  and  adapting  stimulus  of  intensity 
I  .  Thus,  the  neural  population  effectively  has  a  reduced 
sensitivity  as  a  result  of  the  prolonged  stimulation.  The 
presentation  of  a  test  stimulus  in  the  adapted  state  can 
therefore  be  modelled  by  computing  from  Eguation  5-34 
above  and  then  using  this  value  of  instead  of  N  The 

boundary  stimulus  after  adaptation  is  the  value  of  a  for 
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which  N  j  =  N2  -  4 

5.5.3  Fitting  the  /d a/- Adaptor  Data 

It  is  convenient  to  fit  the  data  from  Experiment  9 
first  since,  if  the  model  is  correct,  the  boundary  shifts 
for  the  composite  adaptor  can  be  predicted  from  that  due  to 
either  adaptor  alone.  The  fitting  process  is  carried  out 
essentially  as  described  for  the  discrimination  data  in 
Section  5.4.  The  same  method  of  least  sguares  was  employed: 
the  sum  of  sguares  difference  between  the  model  and  the  data 
was  calculated  for  each  of  the  two  subjects.  To  compute  the 
predicted  boundary  shifts,  only  rar  was  allowed  to  vary. 

All  other  parameters  remained  set  as  for  the  AX 
discrimination  data  in  Section  5-4.  The  adapted  boundaries 
were  obtained  by  computing  Nq^  from  Equation  5-34  for  each 
/dae/  adapator  of  intensity  ,  and  the  the  solution  N  ^  = 

N  9  was  then  iteratively  determined. 

The  results  of  the  least  sguares  fit  are  shown  in  Fig. 
5.12.  Because  of  the  variablity  of  the  boundary  shifts  for 


4  This  assumes  that  there  is  no  bias.  However,  any  bias  due 
to  repeated  presentation  of  an  anchor  stimulus  will  result 
in  a  shift  in  the  same  direction,  and  thus  in  this  model 
will  be  indistinguishable-  From  the  results  of  Simon  and 
St uddert-Kennedy  (1978)  it  would  appear  that  anchor  effects 
are  generally  smaller  than  effects  due  to  adaptation-  For 
this  reason,  and  lacking  a  component  of  the  model  which 
dictates  excactly  how  the  boundary  would  shift  due  to  bias, 
the  bias  is  taken  to  be  zero. 
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Fig.  5.12.  Best-fit  adaptation  curves  for  the  case  of  a  /dae/ 
adaptor.  The  solid  line  represents  the  n=l  model 
and  the  dashed  line  the  n=2  model.  (a)  subject  DS 
(b)  subject  JH  vertical  bar  is  an  estimate  of 
the  limit  error  based  on  the  observed  reproducibility 
of  the  unadapted  boundary 
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this  experiment  (which  for  individual  subjects  appears  to  be 
typical  -  see  Ward  et  al.,  1  958),  it  is  difficult  to  decide 
whether  or  not  this  model  demonstrates  the  correct 
behaviour-  Since  only  one  parameter  was  allowed  to  vary 
during  the  fitting  process,  this  restricts  the  range  of 
behaviour  of  the  model-  However,  calculations  show  that 
letting  ether  parameters  vary  (e-g.,  r  )  does  not 
subst antially  effect  the  behaviour  of  the  model,  and 
therefore  the  predicted  curves  of  Fig-  5.12  are 
representative  of  this  model.  As  with  the  AX  discrimination 
data,  part  of  the  difficulty  arises  from  the  fact  that  the 
power  law  behaviour  of  loudness  does  not  hold  at  low 
intensities  20  dB  or  so  above  threshold  (Heilman  and 
Heilman,  1975;  Heilman  and  Zwislocki,  1  963)-  In  these 
studies,  the  growth  of  loudness  increases  as  a  power  law 
with  a  larger  exponent  for  the  low  intensity  ranges. 

Heilman  and  Heilman  suggest  a  modification  of  the  power  law 
behaviour  of  the  form 


f  (I) 


I0(6-e-(1  +  Tl)5£nB) 


(5-35) 


in  order  to  account  for  this  low  intensity  deviation. 
(Heilman  and  Heilman  determine  that  0-9  and  2-5  are 
representative  values  of  3  and  1,  respectively).  This 
equation  is  based  on  Zvislocki’s  (1973)  model  of  the  firing 
rates  of  sensory  neurons,  and  modifies  the  form  of  the 
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loudness  function  for  intensities  within  about  20  dB  of 
threshold.  This  modification  also  has  the  desirable 
property  that  it  changes  the  slope  of  the  function  f  (I)  from 
-  00  to  0  for  1  =  0. 

There  is  yet  another  reason  for  suspecting  the  low 
intensity  behaviour  of  f(I).  It  was  pointed  out  above  that 
due  to  the  rapid  loss  of  signal  fidelity  for  low  intensity 
signals  (due  to  fewer  tits  available  for  the  digitial 
represenat ion  of  the  signal) ,  below  approximately  20  dB  or 
so  the  signal  behaves  as  if  more  roise  were  being  added  to 
the  signal.  In  terms  of  the  present  experimental  paradigm, 
then,  the  effective  threshold  is  perhaps  only  30  -  40  dB 
down  from  the  maximum  signal  intensity.  lor  these  reasons, 
it  is  likely  that  the  low  intensity  behaviour  of  the 
composite  /bae/-/dae/  signals  is  not  adequately 

Q 

char acterized  by  f  (I)  =  I  .  The  calculated  boundary  shifts 
shown  in  Tig.  5.12  are  consequently  too  large  for  low 
intensity  adaptors.  To  test  this,  a  modification  to  f ( I) 
was  attempted: 

i 

f(I)  =  I9(l-e'312)  (5-36) 

which  is  a  simplified  implementation  of  the  Heilman  and 
Heilman  (1975)  correction  (Equation  5-36).  Letting  g  be  a 
free  parameter,  the  adaptation  model  was  re-computed,  and 
the  results  are  shown  in  Eig.  5.13.  Some  improvement  in  the 
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Fig.  5.13.  Best-fit  adaptation  curves  for  the  case  of  a  /dae/ 
adaptor  when  the  low-intensity  correction  for  N(a) 
is  included  (calculated  for  n=2  model  only). 

(a)  subject  DS  (b)  subject  JH 
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fit  of  the  model  is  observed,  inasmuch  as  it  is  possible  to 
tell  with  this  data.  Effectively,  significant  boundary 
shifts  do  not  occur  until  the  adaptor  has  an  intensity  of  60 
dB  SPL  or  so.  This  is  roughly  in  accordance  with 
measurements  of  TTS,  which  show  that  significant  TTS  is  not 
produced  until  the  adaptor  intensity  is  somewhere  in  the 
range  60  -  80  dB  (Selters,  1964  ;  Ward  et  al.  ,  1958). 

The  values  of  rarextracted  during  the  fitting  process 
were  0.13  for  subject  DS  and  0.23  for  subject  JH.  These 
values  appear  low  by  at  least  an  order  of  magnitude,  since 
Keeler*s  (1968)  estimated  time  constants  show  that  r  should 
be  approximately  2.  It  is  not  clear  why  this  is  so.  The 
fact  that  only  one  "adaptation  level"  was  included  in  the 
model  is  not  likely  to  account  for  such  a  large  deviation 
from  the  expected  result.  However,  since  r  controls  the 
overall  level  of  excitation  in  the  neural  population,  it 
likewise  controls  the  number  of  adapted  cells.  If  r  is 
increased,  N  is  increased.  Thus,  for  a  constant  N_  ,  if 
r  is  increased,  then  ri  must  decrease.  With  this  model, 
similar  dispersion  can  be  produced  for  various  combinations 
of  parameters,  and  it  is  difficult  to  anchor  any  one  of  them 
absolutely.  So,  for  the  present  purposes  it  is  sufficient 
to  demonstrate  that  the  model  is  capable  of  producing  the 
right  behaviour;  more  sophisticated  experiments  will  be 
reguired  to  elicit  more  accurate  values  of  the  parameters. 
One  such  experiment  would  involve  the  change  in  the  category 


• 
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boundary  as  a  function  of  time,  since  this  would  allow 
recovery  of  the  time  constants. 

5.5.4  fitting  the  /bae/-/dae/  Adaptor  Data 

Inasmuch  as  the  estimates  of  fit  of  the  /dae/  adaptor 
model  leaves  much  to  he  desired,  approximately  the  right 
behaviour  is  produced-  The  important  question  now  is 
whether  cr  not  the  boundary  shifts  for  the  composite  /bae/- 
/dae/  adaptor  (Experiment  8)  can  be  predicted  on  the  basis 
of  either  adaptor  alone.  The  model  for  the  composite 
adaptor  is  virtually  identical  to  that  for  the  single 
adaptor  above.  The  only  difference  is  that  for  each  adaptor 

a  ,  bcth  N'  and  N'  are  computed  from  Eguation  5^35 

a  10  20 

(using  a  and  1-  a  ).  The  predicted  boundary  shifts  are 

a.  a 

then  calculated  by  iterating  a  value  of  a  for  which  N  = 

N  .  The  calculated  boundary  shifts  for  the  composite 
adaptor  case  are  shown  in  Fig.  5-14. 

Again,  the  model  performs  better  for  subject  JH  (whose 
boundary  shifts  were  averaged  with  those  for  GR)  than  for 
subject  DS.  The  most  important  feature  of  the  calculated 
curve  for  n=2  is  that  it  shows  the  desired  inflection.  The 
n= 1  model,  on  the  other  hand,  shows  incorrect  curvature,  and 
therefore  is  eliminated  from  further  consideration. 
(Calculations  show  that  this  curvature  persists  for  the  n=1 
model  for  any  range  of  parameters).  The  fact  that  the  n=2 


n  =  I 
n  =2 
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Fig.  5.14.  Calculated  boundary  shifts  for  the  case  of  a  composite 
/bae/-/dae/  adaptor  for  the  n=l  (solid  lines)  and 
n=2  (dashed  lines)  model.  (a)  subject  DS  (b)  subject  JH 
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model  underpredicts  the  boundary  shifts  is  not  particularly 
disturbing  in  the  light  of  Tig.  5-12.  The  failure  of  the 
model  to  properly  account  for  the  low  intensity  behaviour 
propagates  into  the  composite-adaptor  model  as  a  reduction 
in  the  boundary  shift  for  a  close  to  0  or  1  (i.e.,  pure 
/dae/  or  pure  /bae/  adaptors) .  The  discrepancy  is  also 
enhanced  by  the  fact  that  for  subject  DS ,  the  boundary  shift 

for  a  =  0  is  less  than  that  for  a  =  0.1  or  a  =0.2. 

a  a  a 

For  subject  JH,  the  /dae/-adaptor  fit  (Fig.  5.12)  does  not 
produce  a  shift  for  a  =  0  as  great  as  the  approximate 

cl 

asymptotic  values  of  Fig.  5.13,  which  prevents  the  /bae/- 
/dae/  boundary  shift  curve  from  attaining  the  large  shifts 
necessary  to  improve  the  overall  fit.  Improvement  in  the 
fit  of  the  model  can  be  achieved  by  increasing  r 

ar 

The  slight  upturn  of  the  predicted  boundary  shift  curve 
for  a  close  to  1  (and  downturn  close  to  a  =0)  again 

cl  cl 

g 

results  from  the  fact  that  the  power  law  f  (I)  =  I  has  an 
infinite  slope  at  1=0,  which  again  suggests  that  f (I)  is 
primarily  at  fault.  Using  f(I)  defined  by  Equation  5-37 
above,  the  boundary  shifts  were  re-calculated,  and  the 
predicted  boundary  shifts  are  shewn  in  Fig.  5.15.  The 
predicted  shifts  are  somewhat  closer  to  the  measured 
results,  which  means  that  the  modification  required  to 
improve  the  fit  of  the  /dae/-ada ptor  shifts  at  the  same  time 
improves  the  fit  of  the  composite-adaptor  data.  Within  the 
context  of  this  model,  then,  the  view  is  supported  that  the 
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(b) 


Fig.  5.15.  Calculated  boundary  shifts  for  the  case  of  a  composite 
/bae/-/dae/  adaptor  when  the  low-intensity  correction 
for  N(a)  is  included.  (a)  subject  DS  (b)  subject  JH 
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effects  of  the  composite  adaptor  are  basically  due  to  that 
of  /bae/  ana  /dae/  adaptors  acting  independently,  and  the 
curvature  of  the  boundary  shift  curve  (e.g.,  Fig-  5.15) 
results  from  the  effect  of  the  inhibition-  In  one  sense, 
the  effect  of  the  inhibition  is  to  make  the  system  operate 
in  a  more  "phonetic-like"  manner. 


5.6  SUMKAKY 


Ecth  the  discrimination  model  and  the  selective 
adaptation  model  were  computed  allowing  only  one  or  two 
parameters  to  vary.  This  places  severe  demands  on  the 
model,  and  the  fits  can  certainly  be  improved  by  allowing 
more  parameters  to  vary.  This  would  not  accomplish  much, 
however,  since  the  status  of  many  of  the  assumptions  made  in 
the  formulation  of  the  model  are  unclear.  In  the  present 
instance,  the  intent  is  not  to  derive  accurate  estimates  of 
the  parameters  since  the  value  of  the  parameter  can  only  be 
as  secure  as  the  assumption  which  brings  it  into  the  model, 
fchat  is  of  more  interest  is  whether  or  not  the  model  behaves 
in  the  right  way.  Since  the  simple  model  assuming  strict 
power  lav  dependence  on  intensity  (Equations  5-2)  cannot 
account  for  the  observed  data,  the  task  is  then  to  find  the 
"just  necessary  additional  conditions"  to  give  the  model  the 


correct  form. 
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The  model  has  been  cast  in  terms  of  hypothetical  neural 
populations  in  order  to  use  linear  systems  analysis.  At 
best/  it  is  a  functional  model/  and  represents  "creative 
neural  modelling".  Even  so,  the  model  has  a  certain 
explanatory  value.  While  it  cannot  be  unambiguously  claimed 
that  this  model  "proves"  that  the  /b/  and  /d/  processing  is 
carried  out  via  essentially  separate  channels  of  analysis, 
it  certainly  appears  as  if  this  is  the  case.  Mutual 
inhibition  is  perhaps  only  one  of  many  "just  necessary 
additional  conditions",  but  it  is  a  plausible  inclusion  to 
the  model.  It  provides  the  suppression  of  the  weaker  signal 
component  which  seems  to  be  indicated  in  Experiments  1 
through  8,  and  provides  the  dispersion  of  the  a-  continuum 
which  is  necessary  to  account  for  the  discrimination 
results.  However,  it  will  take  more  experiments  than  these 
to  establish  whether  or  not  this  is  a  reasonable  analysis. 

The  present  model  assumes  that  the  adaptation  effects 
occur  as  a  result  of  desensitization  of  a  specialized  neural 
population.  Consequently,  it  is  consistent  with  a  "detector 
theory"  model  of  selective  adaptation,  but  it  does  so 
indirectly.  Since  the  model  assumes  that  the  /b/  and  /d/ 
components  of  the  composite  signal  are  functionally 
orthogonal,  whatever  neural  entities  which  are  responsible 
for  the  recognition  of  /b/  and  /d/  can  be  treated  as  static 
templates.  Thus,  although  the  hypothetical  neural 
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populations  assumed  in  the  model  are  "detectors”,  they  are 
so  in  only  a  limited  sense.  There  is  nothing  in  the  model 
which  claims  that  such  physically  distinct  neural  ensembles 
exist,  only  that  within  the  present  specialized  circumstance 
they  respond  as  if  they  were  distinct.  It  is  a  large  step 
from  these  limited  detectors  to  detectors  which  span  some 
continuum  such  as  frequency  of  F2  etc. 

5.7  Extension  to  Dichotic  Listening 

Since  discrimination  and  selective  adaptation  paradigms 
are  two  of  the  major  experimental  methodologies  of  speech 
perception  studies,  it  is  worthwhile  to  consider  whether  or 
not  the  present  monaural  fusion  paradigm  can  aid  in  the 
interpretation  of  a  third  major  source  of  speech  perception 
data,  binaural  fusion.  As  will  be  seen,  this  result  places 
further  demands  on  the  model,  and  although  the  model  is 
observed  to  be  inadequate  to  account  for  the  complexity  of 
this  data,  it  certainly  forms  a  strong  basis  for  the 
interpretation  of  the  results.  Its  principal  virtue  in  the 
dichotic  paradigm  is  to  provide  a  basic  framework  to 
investigating  the  additional  complexity  required  to  account 
for  central  integration  of  monaural  acoustic  cues. 


CHAPTER  6 


El N AURAL  FUSION 


Depending  on  the  type  of  stimuli  which  are  presented 
dichotically ,  the  resulting  percept  may  be  either  fused  or 
unfused.  (The  stimuli  are  said  to  be  "fused"  when  only  a 
single  entity  is  perceived  and  "unfused"  when  two  separate 
entities  are  perceived)-  Fusion  of  CV  syllables  generally 
occurs  if  the  stimuli  are  temporally  aligned  and  have  the 
same  fundamental  frequency  (Repp,  1976).  The  identity  of 
the  fused  percept  usually  corresponds  to  one  of  the  two 
stimuli  which  are  presented,  but  is  sometimes  a  phonetic 
mutant  thereof  (Cutting,  1976).  Most  experiments  have 
involved  dichotic  contrasts  of  stimuli  which  differ  in  their 
spectral  structure  (see  Cutting,  1976,  for  a  taxonomy  of  the 
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paradigms) ,  with  the  two  stimuli  being  presented  at  equal 
intensities.  Eerlin,  lowe-Bell,  Cullen,  Thompson  and 
Stafford  (1972),  however,  using  dichotically  presented 
nonsense  syllables  at  various  interaural  intensities,  show 
that  a  right  ear  superiority  can  exist  even  when  the  right 
ear  signal  is  attentuated  15  dB  ci  more  below  the  left  ear 
signal  level.  This  indicates  that  ear  dominance,  for 
certain  pairs  of  stimuli,  may  be  relfected  as  a  differential 
sensitivity  tc  monaural  inputs  at  some  central  site-  Repp 
(1976)  suggests  that  the  degree  of  central  interaction 
should  be  sensitive  to  the  relative  interaural  intensities 
of  the  acoustic  cues  and  also  depends  on  the  perceptual 
distance  of  the  stimuli  from  their  "prototype"  values.  In 
the  monaural  fusion  paradigm  under  investigation,  stimulus 
composition  is  controlled  by  the  relative  intensity  of  the 
the  two  signal  components,  and  by  using  simultaneous 
monaural/binaural  fusion,  it  should  by  possible  to 
investigate  in  more  direct  fashion  the  role  of  relative 
interaural  intensities  on  the  central  integration  of  speech 
cues . 

6  -  1  EXPERIMENT  10 :  DICHOTIC  PRESENTATION 

The  dichotic  experiment  described  in  this  chapter 
consisted  of  carrying  out  Experiment  1  simultaneously  in 
both  ears,  but  reversed  for  the  left  ear-  That  is,  as 
varied  from  0  to  1  in  the  right  ear,  it  simultaneously 
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varied  from  1  to  0  in  the  left  ear.  Thus,  as  the  /b/ 
component  increased  in  the  right  ear  it  simultaneously 
decreased  in  the  left.  The  converse  was  true  for  the  /d/ 
component.  Fig.  6.1  illustrates  the  presentation  of  the 
st imuli. 

This  design  was  intended  to  test  the  sensitivity  of 
binaural  recognition  to  subtle  differences  in  interaural 
/b/-/d/  ratios  by  making  the  combined  signal  (i.e.,  if  the 
left  a  id  right  ear  intensities  were  added)  contain  an 
approximately  equal  binaural  intersity  of  /b/  and  /d/  for 
all  values  of  a  .  Expressed  in  eguational  form,  the  stimuli 
presented  to  the  right  and  left  ears  were 


SR  =  a  sb  ♦  d-a  )  sd 


( 6— la) 


S 


L 


( 1 -  a  )  s  +  as 

b  d 


(6- 1 b) 


where  sb  and  represent  the  time  waveforms  of  the  /b/  and 
/d/  formant  transitions. 


As  an  additional  control  parameter,  the  interaural 
intensity  difference,  A  I,  was  also  varied.  For  a  given 
value  cf  A  I  (in  dE) ,  the  right  and  left  ear  stimulus 
combinations  were  scaled  by  factors  wR  and  wR 
calculated  from  A  I  according  to 


which  were 
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LEFT 

EAR 


RIGHT 

EAR 


a  /b/ 


( i  —  a)  /d/ 


Fig.  6.1.  Schematic  arrangement  for  dichotic  presentation 
of  composite  /bae/-/dae/  stimuli 
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WR2  =  (1  +  10 


A  I 

10-1 


(6-2a) 


an  d 


WL2  =  1  -  "R2 


(6-2b) 


(i.e.,  A i=i 01og1Q (wR2/wL2) .  The  signal  combinations  for 
the  right  and  left  ears  were  then  scaled  by  wR  and  wR  to 


yield 


(6  -3a) 


(6-  3b) 


The  scaling  factors  wR  and  wR  calculated  in  this  fashion 
maintained  approximately  egual  binaural  loudness 
independently  of  the  value  of  A  I.  The  twenty-one 
inte nsity- ad j usted  stimuli  used  in  the  AEX  and  AX 
discrimination  studies  were  also  used  in  this  experiment  to 
equalize  the  overall  intensity  of  the  composite  stimuli. 

Each  run  was  conducted  using  a  particular  value  of  A  i 
and  consisted  cf  ten  presentations  each  of  the  twenty-one 
dichotic  /bae/-/dae/  combinations.  A  I  ranged  from  -25  dE 
to  +25  dB  in  5  dB  steps,  including  the  left  monaural 
(AI  =  -  00  )  and  right  monaural  (A  I  =  +  00  )  cases.  All 


aspects  of  the  playback  circuitry  which  might  affect  the 
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right  and  left  ear  intensity  differences  were  controlled- 
Matched  amplifiers,  filters  and  headphones  were  used  to 
ensure  egual  fidelity  of  both  playback  channels  and  to 
minimize  the  amount  of  crosstalk- 1  The  presentation  level 
(80  dB  SPI )  was  checked  prior  to  each  session  and  the 
headphones  were  checked  for  balance.  The  session- to-session 
variability  was  less  than  ±0.3  dB.  Only  one  subject 
participated  at  a  time,  and  wore  a  matched  set  of  TDH-49 
headphones  (Fig.  4.8),  always  in  the  same  orientation.  All 
aspects  of  interstimulus  timing,  response  collection  and 
tabulation  were  as  previously  described  for  Experiment  1. 

The  subject  responded  to  each  stimulus  presentation  as 
either  /bae/  or  /dae/  (or  /bae/-like  vs.  /dae/-like)  . 

The  five  subjects  of  Experiment  1  participated;  all 
were  right  handed.  Three  runs  for  each  value  of  I  were 
obtained  from  subjects  GR,  JH  and  DS,  and  two  from  subjects 
GM  and  PA.  Four  or  five  runs  were  generally  carried  out  on 
any  particular  session.  The  first  run  of  any  session  was 
either  left  or  right  monaural  (i.e.,  A  I  =  -°°or+°°)  and 
served  as  a  practice  run  for  the  session. 

The  perceived  stimuli  were  always  fused,  and  some  of 
the  stimuli  sounded  like  a  clear  /bae/  or  /dae/.  Most, 


i  It  was  impossible  to  eliminate  crosstalk  entirely.  The 
separation  of  filters  and  amplifiers  had  some  effect,  but 
the  crosstalk  at  the  headphones  (as  measured  by  a  B&K 
artificial  ear)  was  still  only  -26  dB. 
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however,  had  a  rather  elusive  identity.  There  was  the 
strong  sensation  of  being  able  to  perceive  both  /bae/  and 
/dae/  simultaneously ,  with  the  /b/  occasionally  appearing  to 
slightly  lead  the  /d/.  Controlling  overt  response  bias  was 
very  difficult  for  all  subjects,  and  the  individual  runs 
showed  considerable  variability.  The  subjects  were 
frequently  reminded  to  try  as  hard  as  possible  to  maintain  a 
constant  decision  criterion. 

The  percentage  /bae/  responses  f cr  all  five  subjects 
are  shewn  as  three-dimensional  surfaces  in  Fig.  6.2.  A  few 
representative  individual  runs  are  shown  in  Fig.  6.3  and  the 
averaged  runs  for  subject  GR  are  shown  in  Fig-  6-4.  The  two 
independent  variables  for  the  response  surfaces  of  Fig.  6.2 
are  a  ,  the  fraction  of  /b/  in  the  right  ear  and  A  i,  the 
interaural  intensity  difference.  The  profiles  for  A  1=  - 00 
and  A  1=  +  °°  represent  monaural  identif icat on  runs  and  show 
similar,  but  reversed,  indentif ication  curves. 

The  monaural  identification  runs  were  fitted  to  normal 
ogives  as  previously  described  for  Experiment  1  in  order  to 
obtain  estimates  of  the  category  boundaries,  I^q-  A  three- 
way  analysis  of  variance  was  performed  by  concatenating  the 
left  and  right  ear  monaural  runs  cf  Experiment  1  to  those  of 
Experinent  10.  The  monaural  identification  runs  of 
Experiment  1  differed  from  those  of  Experiment  10  only  in 
that  the  opposite  earphone  was  open-circuited  to  prevent  any 
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Fig.  6.2.  /bae/  identification  curves  as  a  function  of  interaural 
intensity,  AI,  for  the  five  subjects.  "RM"  and  "LM" 
mark  the  conditions  for  right  monaural  and  left  monaural 
presentations,  respectively.  The  shaded  profiles  indicate 
the  binaural  identification  curves  (i.e.,  AI=0  dB) 


PERCENT  JUDGED  /bae/ 
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Fig.  6.3.  Typical  /bae/  identification  curves  for  two 
subjects,  (a)  subject  GR  (b)  subject  GM 


PERCENT  JUDGED  /bae/ 
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a  a 


Fig.  6.4.  Averaged  /bae/  identification  curves  for  subject  GR 
for  values  of  AI  ranging  from  (left  monaural) 
to  +°°  (right  monaural). 
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crosstalk.  This  condition  will  be  referred  to  as  Ml  (one- 
earphcce  monaural)  and  the  monaural  runs  of  Experiment  10 
will  be  referred  to  as  M2  (two-earphone  monaural) .  The 
predictor  variables  for  the  ANOVA  were  SUBJECT,  EAR  and 
CONDITION,  where  CONDITION  was  either  Ml  or  M2.  Both 
SUBJECT  (p<0.001)  and  CONDITION  (pCO.OOl)  were  found  to  be 
significant.  EAF.  was  not  significant  and  there  were  no 
significant  interactions.  SUBJECT  differences  have  already 
been  discussed  in  Section  3.2.2  and  will  not  be  considered 
further.  Calculation  of  the  mean  boundaries  for  the  Ml  and 
M2  conditions  for  both  left  and  right  ear  cases  revealed 
that  in  all  but  one  case  the  M2  condition  produced  a  value 
of  I  which  was  0.5  to  1.2  dB  higher  than  the  c orresponding 
Ml  condition.  The  only  exception  occurred  for  subject  JH 
for  left  ear  Ml  and  M2  runs.  In  that  case  I  was  smaller 
for  the  M2  condition  by  0.5  dB.  Overall,  the  M2  condition 
was  0.82  dB  higher  than  the  Ml  condition  for  the  right  ear 
presentations,  and  0.76  dB  higher  for  the  left  ear 
presentations.  At  present  no  explanation  can  be  offered  for 
this  systematic  shift. 

6.2  EAR  DOMINANCE 

The  profiles  for  A  1=  0  dB  (the  shaded  sections  of  Eig. 
6.2)  represent  the  perfectly  binaural  case  where  the 
intensities  of  the  composite  signals  delivered  to  the  left 
and  right  ears  were  egual.  It  can  be  seen  from  Fig.  6.2 
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that  for  the  perfectly  binaural  case  the  responses  generally 
follow  the  stimulus  in  the  right  ear,  indicating  right  ear 
dominance.  The  exception  is  subject  JH  who  evidently 
demonstrates  little  or  no  ear  dominance.  Subjects  GE,  DS 
and  GM  all  found  the  task  most  difficult  around  A  1=  -20  dB 
(i-e.,  with  the  right  ear  signal  20  dB  below  the  left  ear 
signal).  As  can  be  seen  from  Fig.  6.2,  this  is  the  point 
where  the  recognition  scores  stopped  following  the  right  ear 
signal  and  began  following  the  left  ear  signal.  Thus,  it 
appears  that  for  these  subjects  the  point  of  no  ear 
dominance  is  in  the  vicinity  of  -20  dB,  which  is  5  -  10  dB 
less  than  that  found  by  Berlin  et  al.  (1972) .  Subject  JH 
found  all  non-monaural  runs  to  be  difficult,  evidently  due 
to  his  lack  of  ear  dominance. 

To  extract  more  information  on  possible  ear  dominance, 
the  data  for  each  subject  corresponding  to  a  =  0  and  a  =  1 
were  plotted  as  a  function  of  A  I  (Fig-  6.5).  These  curves 
(which  correspond  to  the  sides  of  the  3-D  surfaces  in 
Fig.  6.2)  show  the  percentage  of  responses  when  (a)  a  pure 
/bae/  is  presented  to  the  right  ear  and  a  /dae/  is  presented 
to  the  left  and  (b)  a  pure  /dae/  is  presented  to  the  right 
ear  and  a  /bae/  is  presented  to  the  left. 2 

The  identification  curves  of  Fig.  6.5  show  the  same 


2  Hereafter  these  two  conditions  will  be  referred  to  as 
,,/b/-/d/M  and  "/ d/-/b /"  respectively. 


PERCENT  JUDGED  /bae/  OR  /dae/ 
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Fig.  6.5.  Ear  dominance  curves  for  the  /bae/-/dae/  condition 

(dashed  lines)  and  /dae/-/bae/  condition  (solid  lines) 
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Fig.  6.6.  Typical  ear  dominance  curves  for  dichotic  chords 
(after  Yund  and  Efron,  1975) 


RESPONSES 
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trend  as  the  ear  dominance  curves  for  dichctic  chords 
obtained  by  Efron  and  Yund  (1975;  see  also  Yund  and  Efron, 
1975).  In  the  Efron  et  al.  experiments,  dichotic  "chords" 
consisting  of  a  combination  of  two  tones  of  slightly 
differing  frequencies  (e-g.,  1650  and  1750  Hz)  are  presented 
to  the  two  ears.  In  a  two  interval  discrimination  paradigm, 
the  lower  tone  is  presented  to  the  left  ear  simultaneously 
with  the  higher  tone  in  the  right  ear,  followed  by  the 
reverse  configuration .  The  subjects*  task  is  to  identify 
the  interval  which  contains  the  higher  pitch.  Efron  and  his 
co-vorkers  have  obtained  ear  dominance  curves  for  these 
dichotic  chords,  and  have  summarized  their  findings  as  a 
family  of  ear  dominance  curves  as  shown  in  Fig.  6.6  (Yund 
and  Efron,  1975).  For  right  ear  dominance,  ear  dominance 
curves  such  as  Ej  or  F2  are  obtained,  where  the  surscript 
indicates  increasing  ear  dominance.  According  to  this 
classification  scheme,  subjects  GE,  DS  and  GM  show 
pronounced  right  ear  dominance.  Curve  "0"  represents  case 
of  no  ear  dominance.  The  major  feature  of  such  curves  is  a 
broad  intensity  independent  plateau  typically  extending  ±30 
dB.  Subject  JH  in  Fig.  6-5  evidently  shows  such  a  plateau, 
although  in  his  case  it  only  extends  ±10  dB.  A  curious 
feature  of  the  curves  of  Fig.  6.5  is  the  dip  around  A 1=  +10 
dB  (with  the  possible  exception  of  subjects  JH  and  PA).  The 
data  of  Efron,  Dennis  and  Yund  (1977)  show  just  such  a 
perturbation  for  hemispherectomized  subjects  (as  well  as  a 
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significant  number  of  normal  subjects) ,  also  at 
approximately  +10  <3B.  The  origin  of  this  dip  is  unclear. 


Another  feature  of  Fig.  6.5  is  the  the  difference 
between  the  /b/-/d/  and  /d/^/b/  curves:  the  /b/-/d/  curve  is 
systematically  lower  than  the  /d/-/b/  curve  for  all  subjects 
except  possibly  GM.  This  difference  between  the  two  ear 
dominance  curves  represents  ’’stimulus  dominance",  and  can  be 
attributed  to  a  different  ear  dominance  for  the  two  stimuli 
(Repp,  1977).  The  present  data  indicate  that,  in  general, 
there  was  more  right  ear  dominance  for  the  /dae/  than  for 
the  /bae/.  Repp  (1  976)  suggests 
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as  an  index  of  stimulus  dominance,  and 


E 


TRE  C i )  "  TLE(j) 
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(6-5) 


as  an  index  of  ear  dominance  (where  (i)  =  /b/  and  (j)  = 

/d/).  Td„  , .  .  represents  the  fraction  of  /b/  responses  when  /b/ 
was  presented  in  the  right  ear  and  TLE  ^  is  the  fraction  of 
/d/  responses  when  /d/  was  presented  in  the  left  ear  (see 
Repp,  1976,  for  a  full  definition  of  the  variables).  These 
indices  are  typically  computed  for  the  perfectly  binaural 
case  (equal  intensities  at  the  two  ears) .  The  stimulus 
dominance  index,  <|>D  , 


is  related  to  the  difference  in  the 
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/b/  and  /d/  scores  (in  the  present  example)  for  a  given  ear, 

and  the  ear  dominance  index,  <j>  ,  is  related  to  the 

h 

difference  in  ear  scores  for  a  given  stimulus.  If  <j>  is 
1.0,  the  subject  is  right  ear  dominant,  and  if  -1,  he  is 
left  ear  dominant.  Presumably,  when  <j>  =0,  the  subject  has 
no  ear  dominance  for  that  particular  pair  of  stimuli.  lable 
6-1  below  summarizes  the  two  indices  for  the  five  subjects, 
as  calculated  from  Eguations  6-4  and  6-5  for  the  points 
on  the  ear  dominance  curves  corresponding  to  A  I  =  0  dB. 

TABLE  6-1 

EAE  AND  STIMULUS  DOMINANCE  INDICES 


SUBJECT 

*E 

GB 

0.81 

-0.  22 

JE 

0.19 

-0.37 

DS 

1.00 

0.  00 

GM 

0.95 

-0.  23 

PA 

0.33 

-0.  82 

Table  6-1 

shows  pronounced 

right 

ear 

dominance  for  subjects 

GB ,  DS  and 

GM  and  a  slight 

left 

ear  dominance  for  subject 

JH .  These 

results  compare 

favourably 

with  a  visual 

comparison 

of  Fig .  6.5  with 

the 

Efron 

et  al.  ear  dominance 
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curves  in  Fig.  6.6.  The  data  for  subject  PA  are  equivocal, 
since  the  /d/-/b/  curve  for  this  subject  (Fig.  6.5)  shows 
strong  right  ear  dominance  while  the  /b/-/d/  curve  shows 
slight  left  ear  dominance.  Table  6-1  shows  a  slight  right 
ear  dominance  for  this  subject,  but  this  merely  reflects  the 
averaging  of  the  information  from  the  two  ear  dominance 
curves.  Table  6-1  also  shows  that  in  general  the  /d/-/b/ 
configuration  is  dominant  over  the  /b/-/d/  condition 
(column  <j>D  ).  This  also  is  consistent  with  Fig.  6.4,  which 
shows  that  the  /b/-/d/  curve  is  in  general  lower  than  the 
/d/-/b/  curve. 

6 . 3  A  MODEL  OF  BINAURAL  INTERACTION 

The  monaural  model  (Section  5.2  etc.)  can  be  extended 
to  accommodate  the  binaural  case  following  the  suggestions 
of  Repp  (1976).  Repp  proposes  a  "multicateg orical"  model  in 
which  stimulus  processing  occurs  in  three  stages:  (a) 
auditory  processing  (b)  multicategorical  processing  and  (c) 
a  higher  level  phonetic  decision.  The  first  stage,  auditory 
processing,  is  identified  in  the  present  model  as  the 
initial  transduction  of  stimulus  energy  and  is  represented 
by  the  excitation  function  f(I)  in  Equation  5-10.  The 
second  stage,  "multicategorical  processing",  is  the 
conversion  into  the  excitation  levels  N-j^  and  N2  of  the  two 
neural  populations.  The  third  stage  is  the  decision  process 
involving  N and  N2  and  is  represented  by  the  -N2 
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decision  plane.  The  Yund  and  Efron  (1977)  model  for  pitch 
salience  of  dichotic  chords  is  identical  with  the  present 
model  at  this  level  of  description. 

The  binaural  model  corresponding  to  Experiment  10  is 
represented  by  the  two  equations 


(6-6a) 


an  d 


(6-  6b) 


where  N2R,  N1R,  etc.  are  the  excitation  levels  of  presumed 
peripheral  (i.e.,  pre-f usional)  /t/  and  /d/  detectors,  and 
are  given  by  Equations  5-10.  Nj  and  are  the 

corresponding  excitations  at  the  central  level.  Weighting 

factors  co  1  and  oo2  represent  ear  dominance  for  the  /b/  and 

W1 

/d/  stimuli  respectively.  (The  factor  — - —  will  be 
discussed  shortly).  If  co^  or  oo2  are  less  than  unity,  then 
the  subject  is  right  ear  dominant  for  those  stimuli;  if 
greater  than  unity,  he  is  left  ear  dominant. 

In  this  model  of  dichotic  interactions,  the  central 
integrating  mechanism  involves  a  linear  combination  of  the 
monaural  /b/  and  /d/  excitation  levels.  This  results  in  two 
new  variables,  Nj  and  N2  ,  which  then  are  plotted  in  a 
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binaural  decision  plane  whose  form  is  identical  to  that  of 
Fig.  5.1.  This  binaural  model  supercedes  the  monaural  model 
developed  in  Chapter  5.  Note,  however,  that  when  input  to 
either  ear  is  zero,  this  model  reduces  to  either  a  left  or 
right  monaural  model.  Since  the  analysis  of  variance  above 
shows  that  there  are  no  significant  boundary  differences 
between  the  two  ears.  Equations  6-6  must  predict  the  same 
category  boundary  when  the  input  to  either  ear  is  reduced  to 
zero.  Now,  in  the  absence  of  lias,  the  binaural  category 
boundary  is  defined  by 


N 


1 


(6-7) 


A  monaural  boundary  occurs  when  either  N2R  and  N1R  are  both 
zero,  or  N9R  and  N^  R  are  both  zero.  In  these  cases,  the 
boundaries  are  defined  for  a  value  of  a  such  that 


N 


2  R 


03. 


N. 


“2  1R 


(6-8) 


and 


N 


2  L 


0) 


1  N 


0) 


1L 


(6-9) 


This  equivalence  of  left  and  right  category  boundaries 
occurs  because  of  the  inclusion  of  the  weighting  factor 
in  Equations  6-6.  The  significance  of  this  property  of  the 
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model  is  discussed  in  Section  6.3.2  below. 


6.3.1  Calculating  Ear  Dominance  Curves 

Eguations  6-6  are  valid  for  dichotic  presentation  of 
composite  /fcae/-/dae/  stimuli.  For  the  calculation  of  the 
ear  dominance  curves,  however,  no  stimulus  combinations  are 
involved.  Each  of  the  two  ears  always  contains  either  a 
pure  /fcae/  or  pure  /bae/,  the  only  variable  being 
interstimulus  intensity  difference,  A I.  For  the  /b/-/d/ 
condition,  N  and  N  become 

N2  =  N2R  and  Ni  =  “1N1L  (6-10> 


since  H2l  =  N1R 


0.  Similarly  for  the  /d/-/b/  condition. 


“2N2L 


and 


(6-11) 


Assuming,  as  in  Chapter  5,  that  the  maximum  stimulus 
intensity  is  unity  (i.e.  ,  a  =  1  represents  unit  intensity 
of  /b/) ,  amplitude  of  the  stimulus  waveforms  are  simply 
given  by  the  scale  factors  wR  and  wR  (Eguations  6-2).  The 
central  excitation  levels  are  then  (for  the  /b/-/d/ 
condition) 


N2  (WR2) 


(6-  12a) 
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N 


1 


“1N1 


(i-wR  ) 


(6-  1 2b) 


where  N 1  and  N2  are  given  by  Equations  5-10- 

With  Equations  6-12  it  is  now  possible  to  compute  the 
ear  dominance  curves-  Eor  the  /b/-/d/  condition. 


P  (wR) 


U-Wr  ) 


(6-13) 


the  probability  of  a  /b/  being  identified  can  be  calculated 
for  all  values  of  w  .  (A  similar  eguation  exists  for  the 
/d/-/b/  condition).  Fig.  6.7  shows  representative  ear 
dominance  curves  calculated  from  Equations  6-13  and  6-12  for 
various  values  of  co^  (=  (jo2).  A  considera tion  of  the  stimulus 
trajectory  formed  by  varying  wR  (or  equivalently,  A I)  shows 
immediately  why  this  model  cannot  generate  identification 
curves  of  the  required  form.  Fig.  6.8  shows  two  stimulus 
trajectories  for  co^  =  w2  =  1.0  and  oj ^  =  co2  =  0.3.  In  each 
case,  the  stimulus  trajectory  crosses  the  decision  line  only 
once,  and  hence  always  generates  an  identification  curve 
with  a  single  inflection-  This  model  is  therefore 
inadequate  to  account  for  the  ear  dominance  curves  of 
Fig-  6.5. 


Yund  and  Efron  (1977)  suggest  that  energy  from  the 
contralateral  ear  via  bone  conduction  accounts  for  the 
intensity  independence  of  the  ear  dominance  curves.  In  the 
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Fig.  6.7.  Calculated  ear  dominance  curves  for  the  cases  of 

no  crosstalk  (solid  lines),  and  -26  dB  of  crosstalk 
(dashed  lines) .  The  numbers  on  the  curves  represent 
values  of  oj1 


Fig.  6.8.  Stimulus  trajectories  for  0^=00  =0  (open  circles)  and 
a)  =0)2=0. 3  (closed  circles).  The  arrows  indicate  the 
direction  of  increasing  AI 
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present  experiment  the  crosstalk  between  playback  channels 
was  -  26  dB,  which  is  substantially  greater  than  the  -50  dB 
of  bone-conducted  signal  assumed  by  Yund  and  Efron. 
Crosstalk  is  easily  incorporated  into  the  present  model  by 
letting  each  monaural  /b/  and  /d/  detector  receive  C  times 
the  signal  level  of  the  /b/  and  /d/  component  of  the 
opposing  channel.  For  a  signal  level  of  wR  2  in  the  right 
ear  and  1  -  wR  2  in  the  left  ear,  the  crosstalk  produces  the 
effective  intensities  shown  in  Fig.  6.9a.  Eguations  6-6 
then  become 


+  “lN2(CtWR  3 


(6-  14a) 


N 1  ■  ^  Vl-wR2)  +  “lN2(CtC1-WR2))  (6'14t) 

were  C  is  a  factor  representing  the  signal  level  from  the 
opposite  playback  channel  (for  -26  dB,  Ct  =  0.0025).  The 
ear  dominance  curves  calculated  assuming  Ct  =  0.0025  are 
shown  as  the  dashed  lines  in  Fig.  6.7.  The  primary  effect 
of  the  addition  of  crosstalk  to  the  model  is  a  compression 
of  the  range  of  ear  dominance.  Therefore,  crosstalk  alone 
is  insufficient  to  explain  the  intensity  independence  of  the 
ear  dominance  curves. 


There  is  still  a  major  difference  between  the  present 
model  of  binaural  interaction  and  the  model  of  Yund  and 
Efron.  The  Yund  and  Efron  model  assumes  that  "...  the 
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R 


w 


Cm(  I  "■  Wp) 


1  -  w 


Fig.  6.9.  Activation  of  peripheral  /b/  and  /d/  "detectors" 

(a)  including  crosstalk  (by  either  bone  conduction 
or  leakage  across  amplifier  channels),  and 

(b)  coupling  of  /b/  and  /d/  detectors.  The  shaded 
blocks  symbolize  the  binaural  decision  plane  (BDP) . 

w^  is  the  weighting  factor  which  controls  the  interaural 
signal  levels  (Equation  6-2) 


co  cc 
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energy  delivered  to  the  ear  at  one  frequency  spreads  to 
excite  channels  that  are  optimally  sensitive  to  nearby 
frequencies. "  (p.  610).  A  similar  modification  to  include 

overlap  of  /b/-/d/  energies  can  also  be  made  to  the  present 
model. 3  The  signal  contributions  to  the  peripheral  /b/  and 
/d/  processors  are  shown  in  Fig.  6.9b.  Equations  6-6  become 
in  this  case 


N2(Wr2) 


(6  -1 5a) 


N  ' 

e 


U)  1  9  9 

—  N.  (l-w_  )  +  a).  N.  (C  wD  ) 
co  2  1  R  1  1  v  m  R  J 


(6-1  5b) 


where  C  is  a  constant  which  determines  the  amount  of 
coupling  between  detector  inputs.  When  Cm=0,  Equations 

6-15  reduce  to  Equations  6-6.  Calculated  ear  dominance 
curves  using  Equations  6-15  and  6-13  are  shown  in  Fig.  6.10 
for  various  values  of  (=  oo2).  These  identification 
curves  now  demonstrate  a  functional  form  similar  to 
Fig.  6.5.  The  reason  for  the  sudden  improvement  can  be 
understood  from  the  nature  of  the  stimulus  trajectory.  Fig. 
6.11  shows  that  the  stimulus  trajectory  may  now  cross  the 
decision  line  up  to  three  times,  depending  on  the  values  of 
co  and  oo2.  it  is  this  convoluted  behaviour  of  the  stimulus 


3  If  /b/  and  /d/  detectors  span  a  perceptual  continuum,  then 
the  assumption  of  coupled  detectors  is  equivalent  to 
assuming  that  one  detector  response  function  is  non-zero  in 
the  region  of  maximum  sensitivity  of  the  other- 
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Fig.  6.10.  Calculated  ear  dominance  curves  when  coupling  of 
/b/  and  / d/  "detectors"  is  assumed 


Fig.  6.11.  Stimulus  trajectories  for  co-^w^l.O  for  the  case 
of  coupled  detectors.  The  arrows  indicate  the 
direction  of  increasing  AI 
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trajectory  which  is  responsible  for  the  formation  of  the 
inflection  points  observed  in  the  ear  dominance  curves. 

These  two  modifications  «•>  mutual  detector  coupling  and 

crosstalk  between  playback  channels4  -  now  generate  a  family 

of  curves  which  have  the  required  general  form.  To  fit  this 

model  to  the  data  in  Fig.  6.5,  a  ,  the  standard  deviation  of 

the  noise  distribution  in  the  binaural  decision  plane  (e.g.. 

Equation  6-13)  was  set  to  a  nominal  values  of  0.01  and 

0.02.  (a  =  0.02  provides  a  better  fit  for  the  model  since 

increasing  a  smooths  the  ear  dominance  curves.  a  =  0.0  1, 

as  will  be  seen  in  Section  6.3.2,  is  a  more  appropriate 

value,  and  the  model  is  calculated  for  both  values  for 

purposes  of  comparison).  Cm  is  unknown,  so  it  was  allowed 

to  vary.  However,  it  was  pointed  out  in  Section  3.1  that 

the  correlation  between  the  /b/  and  /d/  formant  transitions 

was  r=  0.32,  so  C  should  be  of  this  order  of  magnitude. 

'  m  ^ 

likely  it  will  be  less  since  the  correlation  between  the 
waveforms  becomes  greater  as  the  formant  transitions 
asymptote  to  the  steady  state  values  of  the  vowel. 

An  adaptive  least  squares  fit  of  the  model  defined  by 
Equations  6-15  and  6-13  was  carried  out  on  the  ear  dominance 


4  Crosstalk  was  included  in  the  model  anyway  since  it  was 
physically  present  during  the  experiment.  (Ct  was  left  set 
to  0.0025).  Addition  of  crosstalk  produces  only  a  minor 
change  in  the  behaviour  of  the  model,  essentially  changing 

the  value  of  C  for  which  a  given  family  of  curves  occur. 

m 
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data  shove  in  Fig.  6.5.  The  fitted  carves  are  shown  in  Fig. 
6.12  for  a  =  0.02  (solid  line)  and  a  =  0.01  (dashed 
line).  The  extracted  values  of  w  1  and  co2  for  the  five 
subjects  are  presented  in  Table  6-2.  The  value  of  C 

m 

obtained  from  the  fitting  process  was  0.06,  which  is 
reasonable  in  view  of  the  physical  correlation  between  the 
/bae/  and  /dae/  waveforms. 

TABLE  6-2 

EAR  DOMINANCE  WEIGHTING  FACTORS 

SUBJECT  (  a  =  0.01)  (a=  0.02) 


W1 

W  2 

W  2 

G  £ 

0.516 

0.496 

0.  514 

0.502 

J  B 

1.038 

1.022 

1.050 

1.023 

DS 

0.528 

0.507 

0.527 

0.516 

GM 

0.504 

0.505 

0.508 

0.498 

PA 

0.538 

0.503 

0.  540 

0.505 

Repeated  least  sguares  fits  using  different  starting  values 
of  co ^  and  co2  for  the  five  subjects  shows  that  these 
estimates  are  stable  to  within  approximately  ±0.02.  The 
values  of  and  w2  are  least  reliable  for  subjects  GM  and 
PA  due  to  the  poor  fit  of  the  model  for  these  two  subjects. 
The  location  of  the  50  percent  point,  i- e.  ,  the  point  at 
which  the  ear  dominance  curve  rises  from  0  to  50  percent  is 


232 


PA 


Fig.  6.12.  Best-fit  ear  dominance  curves  for  a=0.02  (solid  lines) 
and  a=0.01  (dashed  lines) 
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the  most  reliable  indicator  of  the  amount  of  ear  dominance 
for  right  ear  dominant  subjects,  and  is  off  by  at  least  5  dB 
for  these  two  subjects.  Evidently,  subject  GM  should  show 
moderate  /bae/  dominance  rather  than  the  equal  dominance 
which  the  model  predicts.  (For  subjects  with  little  ear 
dominance,  the  height  of  the  intensity  independent  "plateau1' 
is  the  best  indicator) .  Fitting  the  model  to  the  ear 
dominance  curves  is  a  stringent  test  of  the  binaural  model 
since  both  the  /b/-/d/  and  /d/-/b/  ear  dominance  curves  must 
be  fitted  simultaneously.  A  change  in  either  or  w2 
affects  both  curves,  so  the  fit  reflects  an  eventual 
compromise.  This  limits  the  ability  of  the  model  to  adapt 
itself  to  the  contour  of  the  data,  and  consequently  the  fit 
of  the  model  varies  considerably  between  subjects. 

6.3.2  Fredicting  the  Category  Boundaries 

The  binaural  model  (Equations  6-6)  was  formulated  on 
the  basis  that  monaural  left  and  right  ear  identification 
runs  had  to  predict  the  same  category  boundary.  This 
assumption  is  equivalent  to  stating  that  the  category 

boundary  is  determined  by  the  relative  ear  dominances  of  /b/ 

0)  1 

and  /d/,  specifically  the  ratio  — —  .  This  follows  from 

2 

the  fact  that  a  scaling  of  either  axis  of  the  decision  plane 
is  equivalent  to  changing  the  angle  of  the  decision  line 
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(which  is  at  45  degrees  for  co.^  =  co2  =  1)  . 5  In  the  absence  of 
response  bias6,  the  category  boundary  for  right-monaural 
presentation  is  defined  by  the  value  of  for  which 


U) . 


CO  , 


Ni 


(6-16) 


Using  the  values  of  co^  and  oo2  from  Table  6-2,  Equation  6-16 
was  solved  iteratively  for  a  ,  and  the  resulting  values  of 
150  are  shown  below  in  Table  6-3  (for  a  =  0.01)  . 


TABLE  6-3 

PREDICTED  /hae/-/dae/  EOUN DARIES 


SUEJECT 

MEASURED 

PREDICTED 

hoW 

I50  CdB) 

GF 

0.  50 

3.26 

JH 

1.57 

1.23 

DS 

2.66 

3.  24 

GM 

-4.00 

-0.03 

PA 

-0.52 

5.61 

5  The  angle  (J>  of  the  decision  line  is  given  by  tan  <f>  =  u^/a^ 

6  Changing  re sponse  bias  is  equivalent  to  changing  the  angle 
of  the  decision  line.  This  will  be  indistinguishable  from  a 
change  in  00 ^ / 00 2  an<^  therefore  can  be  ignored. 
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The  predicted  boundaries  are  of  the  correct  order  of 
magnitude  (I5Q  =  0  corresponds  to  ct5Q  =  0-5),  and  the 
correlation  between  the  two  sets  of  boundaries  is  r  =  0- 39, 
which  indicates  that  some  of  the  boundary  placement  is  due 
to  the  different  ear  dominance  for  the  /bae/  and  /dae/ 
stimuli.  The  correlation  would  have  been  higher  but  for  the 
poor  fits  of  the  model  for  subjects  GM  (/b/-/d/  curve)  and 
PA  (/d/-/b/  )  curve. 

To  test  out  what  the  values  of  and  should  have 
been,  the  monaural  identification  data  (the  ends  of  the 
surfaces  in  Fig.  6.2)  were  fitted  to  the  identification 
model  given  by  Equation  6-16  (with  wR  replaced  by  a  ). 
w  ,  ana  were  allowed  to  vary7  and  Cm  was  set  to  0.06. 

TABLE  6-4 

EAE  DOMINANCE  FACTORS  FEOM  I DENTI EICATION  DATA 


SUBJECT 

“l 

W  2 

0 

GR 

0.480 

0.477 

0.007 

JE 

0.  993 

0.966 

0.  008 

DS 

0.  482 

0.461 

0.  016 

GM 

0.  490 

0.522 

0.  011 

PA 

0.477 

0.480 

0.141 

7  Actually  it  is  not  possible  to  recover  both  o>i  and  w 2  from 
this  fitting  procedure;  only  the  ratio  ^\/ ^2  can  be  found, 
since  this  alone  determines  the  location  of  the  boundary. 


• 
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The  recovered  values  of  the  three  parameters  are  shown  in 
Table  6-4  above  (compare  with  Table  6*2) .  The  average  value 
of  a  from  this  procedure  was  a  =0.011,  which  justifies 
the  choice  of  a  =  0.01  used  above  in  fitting  the  ear 
dominance  data. 

From  the  above  results,  the  tentative  conclusion  can  be 
reached  that  the  subjects*  category  boundaries  are 
determined,  at  least  in  part,  by  the  relative  amounts  of  ear 
dominance  for  /b/  and  /d/.  It  is  attractive  to  think  that 
this  is  the  principal  reason,  but  a  better  model  and  more 
detailed  data  will  be  required  before  this  claim  can  be 
substantiated.  If  it  is  true,  then  the  monaural  fusion 
paradigm  affords  a  simple  way  of  measuring  ear  dominance  for 
a  particular  pair  of  stimuli. 

6.4  PREDICTING  THE  DICROTIC  RESPONSES 


All  of  the  parameters  necessary  to  describe  the  entire 
data  surfaces  of  Fig.  6-2  have  been  determined  in  Section  6 
.3  above.  To  calculate  the  predicted  response  surfaces. 
Equations  6-13  and  6-6  are  computed,  with  Nj  and  N^ 
defined  by 


N' =  N2(wR2(a2+Cm(l-a)2))  +  (1-a)  2  +  Cyj2)  <6-17a) 

Nj  =  Nj  (wR2(l-a)2  +  Cma“)  +  ui1Nel  (wL2o2+Cm(l-a)2)}  (6-17b) 
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which  are  the  generalizations  of  the  coupled  detector 
equations  6-15  for  the  case  of  arbitrary  /bae/-/dae/ 
stimulus  combinations.  (These  equations  may  be  easily 
formulated  from  a  consideration  of  Fig.  6.9b).  (Equations  6 
-15  apply  only  for  the  special  cases  of  a  =  0  or  a  =  1). 
Using  the  values  of  w  and  co2  given  in  Table  6-3  and 
a  =0.01,  the  calculated  response  surfaces  appear  as  shown  in 
Fig.  6.13.  It  is  observed  that  for  the  right  ear  dominant 
subjects  (GF,  DS  and  GM)  the  model  has  basically  the 
required  form.  The  model  produces  a  poor  representation  of 
the  data  of  subject  JH,  however.  Part  of  this  deviant 
behaviour  can  be  attributed  to  the  inability  of  the  model  to 
fit  the  ear  dominance  curves  discussed  previously.  However, 
the  most  serious  defect  of  the  model  is  its  failure  to 
predict  the  two  "ridges”  observed  in  the  JH  data,  as  well  as 
the  failure  to  predict  the  "bump"  for  subject  GF.  The  model 
performs  moderately  well  for  subject  DS,  but  this  is 
strictly  a  result  of  the  simple  data  structure  for  that 
subject.  The  calculated  response  surfaces  for  subjects  GM 
and  PA  are  far  from  their  actual  data  surfaces,  primarily 
due  to  the  incorrect  values  of  co  and  go  found  previously. 

Jl  h 

The  effect  of  the  ratio  oo^/c^  can  be  seen  in  Fig.  6.13  as  a 
displacement  of  the  inflection  point  for  the  left  and  right 
monaural  identification  runs. 

The  model  captures  the  coarse  features  of  the  data  of 
Fig.  6.2  and  fits  the  edges  of  the  data  surfaces  (i.e.,  the 
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Fig.  6.13.  Calculated  /bae/  identification  curves  for  the  five 
subjects  (compare  with  Fig.  6.2).  "RM"  and  "LM" 
mark  the  conditions  for  right  monaural  and  left  monaural 
presentation,  respectively 
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two  ends  and  the  two  sides)  fairly  well.  The  point  where 
the  responses  stop  following  one  ear  and  start  following  the 
other  is  reasonably  well  predicted  in  all  cases. 
Interestingly,  the  slight  shift  in  the  transition  region 
between  approximately  A  I  =  -5  and  +10  dB  for  subjects  GR, 

DS  and  PA  is  also  generated  by  the  model.  A  slight  bulge  in 
the  surface  occurs  for  subject  GM  ,  approximately  where  her 
data  also  demonstrate  a  slight  deviation.  These  are  minor 
effects  of  the  model,  tut  major  considerations  for  a  model 
which  truly  accounts  for  all  dichotic  interactions. 

In  summary,  the  model  performs  fairly  well  for 
pronounced  right  ear  dominant  (and  presumably  pronounced 
left  ear  dominant)  subjects,  and  not  at  all  well  for 
subjects  with  little  or  no  ear  dominance.  Clearly  the  no 
ear- do mi nance  case  is  the  most  stringent  test  of  this  (or 
any  other)  model,  and  until  the  model  can  adeguately  account 
for  the  effects  observed  in  the  nc  ear  dominance  case,  the 
model  cannot  be  considered  successful.  Accounting  for  all 
of  the  subject  variations  observed  in  the  data  of  Experiment 
10  will  be  a  challenge  for  any  model. 

6.5  SOMMAFY 

The  binaural  model  developed  in  this  chapter  follows 
the  Yund  and  Efron  (1977)  model  for  pitch  salience  of 
dichotic  chords  and  shares  many  of  the  same  features. 
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Repp’s  (1976)  suggestions  for  "multicategori cal  processing" 
in  which  peripheral  excitations  combine  before  a  central 
phonetic  decision  is  made  appears  to  be  basically 
substantiated.  His  suggestion  that  a  given  stimulus 
partially  excites  neighbouring  detectors  was  found  to  be  a 
key  ingredient  in  the  model.  Even  though  the  model  is  not 
entirely  successful,  it  provides  a  basic  measure  of 
understanding  of  why  the  data  have  the  form  they  do.  The 
failures  of  the  model  themselves  provide  insight  into  what 
information  must  be  obtained  to  disambiguate  between 
possible  models  of  dichotic  interactions. 

It  is  not  exactly  clear  why  the  model  does  not  perform 
better.  lack  of  dependence  of  o  on  interaural  intensity 
difference  is  one  possibility  since  the  ear  dominance  curves 
produce  a  better  fits  if  o  is  allowed  to  become  larger. 
However,  increasing  o  simultaneously  destroys  the  fit  for 
the  rest  of  the  model,  and  these  contradictory  requirements 
of  large  and  small  o  indicate  a  serious  shortcoming  of  the 
model.  One  possible  solution  to  the  problem  which  has  not 
yet  been  investigated  is  to  include  inhibition,  as  was  done 
in  Chapter  5.  Enhancing  the  dispersion  along  the  a  - 
continuum  will  allow  o  to  become  larger  and  yet  maintain 
the  same  discriminability  (as  shown  in  Section  5.4). 

Whether  or  not  this  will  also  generate  the  internal 
structure  of  the  dichotic  reponse  surfaces  is  unknown  at 
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CHAPTER  7 


SUMMARY  AND  CONCLUSIONS 


7.  1  SUMMARY 

In  this  thesis,  several  basic  speech  perception 
phenomena  have  been  investigated  in  an  attempt  to  bring  them 
together  under  a  single  conceptual  rubric-  A  new 
experimental  probe  -  monaural  fusion  -  has  been  developed 
for  this  purpose  which  has  several  advantages  over  existing 
experimental  technigues.  First,  it  uses  real  speech,  and 
thus  avoids  the  contentious  issue  of  whether  or  not  subjects 
can  readily  identify  isolated  stimuli  as  speech  tokens 
(e.g.,  Barclay,  1972)-  Because  natural  speech  tokens  are 
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used,  recognition  of  the  endpoint  stimuli  on  the  continuum 
is  perfect  for  all  subjects,  and  considerably  aids  the 
interpretation  of  the  results.  Second,  the  monaural  fusion 
paradigm  manipulates  a  well-defined  physical  variable: 
relative  intensity.  This  permits  an  exact  specification  of 
the  independent  variable  -  a  variable  often  not  identifiable 
in  speech  perception  studies  when  VOT  or  Fj-F2  continua  are 
involved.  ("Stimulus  number"  does  not  constitute  a  well- 
defined  independent  variable) . 

The  four  experimental  paradigms  investigated  in  this 
thesis  show  one  general  conclusion:  the  /bae/  and  /dae/ 
components  of  a  mixed  /bae/-/dae/  simulus  are  treated  as  if 
they  were  delivered  over  essentially  separate  auditory 
channels.  This  supports  the  view  that  /b/  and  /d/  are 
recognized  by  functionally  separate  neural  entities,  and 
this  independence  of  processors  is  responsible  for  the  fact 
that  this  continuum  is  categorically  perceived.  Whether  or 
not  these  neural  entities  represent  "detectors"  which  also 
span  the  F^-F2  continuum  cannot  be  decided  with  this  data, 
but  the  data  are  certainly  consistent  with  this  view.  The 
results  of  Experiment  10  -  by  virtue  of  the  model  of 
dichotic  interactions  -  suggests  that  the  two  neural 
populations  are  not  quite  separate.  In  the  detector  view  of 
/b/  and  /d/  perception,  this  is  equivalent  to  stating  that 
one  detector  response  function  has  non-zero  sensitivity  in 
the  region  of  maximum  sensitivity  of  the  other  (e.g..  Fig. 
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2.12).  This  is  consistent  with  the  general  view  that  in 
order  to  account  for  the  houndary  shifts  under  selective 
adaptation,  the  detector  response  functions  must  overlap. 

The  experimental  investigations  of  Chapters  3  and  4  and 
the  subsequent  modelling  of  Chapter  5  show  that  stimulus 
combinations  of  the  type  described  produce  perceptual 
effects  which  do  not  require  the  existence  of  hypothetical 
phonetic  levels  of  processing  for  their  explanation.  The 
point  of  the  model  is  not  to  deny  the  existence  of  phonetic 
levels  of  processing,  but  merely  to  attempt  to  account  for 
these  basic  phenomenon  in  terms  of  more  basic  psychophysical 
processes.  tfhile  it  may  be  attractive  to  invoke  phonetic 
processing  as  an  explanation  for  any  phenomenon  involving 
speech  or  speech- like  signals,  this  should  be  done  only  when 
it  can  be  successfully  demonstrated  that  lower  level 
processing  is  insufficient  to  predict  the  observed  effects. 

The  model  of  the  perception  of  mixed  /b/-/d/  stimuli 
should  be  interpreted  in  this  light.  The  use  of  the  term 
"detector11  in  this  case  is  limited  to  mean  a  functionally 
distinct  neural  ensemble  which  characterizes  the  nervous 
system's  response  to  the  entire  /b/  or  /d/  acoustic 
pattern.  This  form  of  detector  does  not  necessarily 
correspond  to  a  "feature  detector"  per  se ,  but  may  perhaps 
represent  the  combined  effects  of  complexes  of  feature 
detectors.  It  is  not  perfectly  clear  what  level  of 


244 


processing  these  hypothetical  neural  populations  are 
supposed  to  represent-  While  they  may  represent  dedicated 
neural  ensembles  which  monitor  auditory  input  for  certain 
select  acoustic  configurations,  this  is  not  the  only 
possible  interpretation.  If  every  acoustic  signal  results 
in  a  neural  representation  which  is  unique  (e.g.,  a  kind  of 
’’neural  spectrogram”)  ,  then  this  neural  population  may  be 
part  of  the  internal  signal  representation  itself.  The 
functional  independence  of  the  two  neural  populations  then 
directly  follows  from  the  degree  of  orthogonality  of  the 
stimuli:  two  signals  which  do  not  have  appreciable  spectral 
overlap  (e.g.,  separated  by  one  or  more  critical  bands)  will 
likewise  not  have  appreciable  communality  of  neural 
exciatiots.  Ihe  model  developed  in  this  thesis  is  invariant 
to  this  assumption.  In  either  interpretation,  for  the 
purposes  of  modelling  it  is  necessary  to  assume  that  some 
global  statistic  (e.g.,  total  excitation)  is  the  variable 
which  participates  in  higher  level  decisions.  This,  of 
course,  is  more  of  a  mathematical  requirement  for  the 
purposes  of  modelling  rather  than  an  imposition  on  the 
nature  of  signal  processing  by  the  nervous  system.  Whatever 
the  nature  of  this  global  statistic,  it  represents  a 
condensation  of  information,  and  thus  can  be  used  to 
represent  the  output  of  a  ’’detector”.  Since  it  is  a 
continuous  variable,  it  follows  from  this  model  that  the 
output  of  a  detector  is  not  a  binary  value,  but  rather  has  a 
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value  which  may  he  increased  or  decreased  in  either  of  two 
ways:  (1)  the  intensity  of  the  stimulus  may  be  increased  or 
decreased,  or  (2)  the  spectro-temporal  composition  of  the 
stimulus  may  he  altered  towards  or  away  from  the  ‘'prototype" 
value.  This  implies  a  certain  equivalence  of  the  monaural 
fusion  paradigm  of  this  thesis  and  the  classic  F  -F2 
paradigm  used  in  synthetic  speech  studies,  and  is  one 
possible  explanation  for  the  concordance  of  the  effects 
shown  for  discrimination,  selective  adaptation  and  binaural 
fusion  in  this  thesis  and  the  same  effects  observed  with  F  - 
F0  continua.  If  this  is  the  case,  then  the  model  developed 
in  Chapters  5  and  6  is  just  a  special  case  of  a  more  general 
model  which  incorporates  a  spectro-temporal  continuum  as 
well. 


I he  model  was  developed  incrementally  in  Chapters  5  and 
6  in  an  attempt  to  justify  the  inclusion  of  certain 
components  of  the  model.  The  essential  features  of  the 
model  will  be  reviewed  here.  The  basic  model  is  that  of  a 
simple  two-detector  configuration  as  shown  in  Fig.  7.1.  It 
was  shown  that  this  model  did  not  demonstrate  the  required 
dispersion  unless  a  mutual  inhibitory  component  was  added. 
The  form  of  this  inhibition  is  quite  arbitrary,  and  the  fact 
that  it  produced  the  required  dispersion  cannot  be 
considered  proof  of  its  existence  since  a  suitable 
modification  of  the  general  intensity  response  of  the 
detectors  themselves  (function  f(I)  in  Equation  5-10)  can 
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Fig.  7.1.  Schematic  processing  of  /bae/-/dae/  composite  signals. 

The  shaded  area  respresents  possible  coupling  of  the 
/b/  and  /d/  processors.  Arrows  represent  possible 
mutual  inhibition  of  processor  outputs 
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accomplish  the  same  end.  Thus,  while  there  is  evidence  that 
some  form  of  interaction  exists  between  the  /t/  and  /d/ 
processors,  it  is  not  necessarily  inhibition.  The  important 
point  to  note  is  that  the  requirement  of  enhanced  dispersion 
to  account  for  the  discrimination  results  simultaneously 
imparts  the  correct  curvature  to  the  boundary  shift  curve  of 
the  selective  adaptation  study  (Section  5.5).  Furthermore, 
it  was  also  shown  that  a  modification  of  the  low  intensity 
behaviour  of  f (I)  to  eliminate  the  implausible  large 
dispersion  at  low  intensities  was  also  sufficient  to  further 
increase  the  curvature  of  the  boundary  shift  curve.  This 
merely  accentuates  the  fact  that  the  behaviour  of  the  model 
is  crucially  dependent  on  the  intensity  response  function 
(f  (I)  in  Equation  5-10).  Until  the  intensity  response 
function  is  known  with  more  certainty,  it  will  be  impossible 
to  state  exactly  the  nature  of  the  interaction. 

The  model  of  selective  adaptation  is  not  unreasonable 
in  the  light  of  parallels  with  auditory  adaptation  and 
fatigue.  The  inclusion  of  a  special  "adaptation  level"  in 
the  state  diagram  of  Fig.  5.11  is  only  one  way  of  modelling 
the  decrease  in  sensitivity  of  a  neural  population.  This  is 
likely  to  be  only  one  component  of  the  total  effects  of 
adaptation  and  fatigue-  Other  effects  such  as  hair  cell 
dysfunction  or  metabolic  changes  in  the  cochlea  itself  may 
account  for  other  aspects  of  auditory  adaptation  and 
fatigue,  but  since  their  effects  on  threshold  shifts  or 
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phonetic  boundary  shifts  will  be  similar,  it  is  difficult  to 
separate  the  individual  sources  (Elliott  and  Eraser,  1970). 

A  model  of  adaptation  based  on  rate  processes  (e.g.. 
Equations  5-24)  at  least  provides  the  basic  exponential 
behaviour  typically  observed  in  auditory  adaptation  studies 
(Keeler,  1968).  The  important  aspect  of  the  model  is  that 
selective  adaptation  be  modelled  in  a  way  which  produces 
functional  dependence  of  boundary  shift  cn  adaptor  intensity 
and  duration  (optionally  number  of  adaptor  presentations) . 
Only  the  intensity  dependence  has  been  analyzed  in  this 
thesis;  the  temporal  dependence  of  boundary  shift  is  of 
equal  importance,  but  will  reguire  detailed  data  on  the 
recovery  of  phonetic  boundary  shifts.  Such  data  are 
currently  not  available  in  the  literature,  but  will  be 
required  for  a  more  comprehensive  understanding  of  selective 
adaptation. 

The  binaural  extension  of  the  model  involves  three 
basic  assumptions,  the  most  important  of  which  is  that  the 
peripheral  excitation  levels  are  simply  summed  at  some 
central  site.  The  second  assumption  is  that  different 
stimuli  have  different  ear  preferences,  and  the  third 
assumption  is  that  the  /b/  and  /d/  components  of  the  stimuli 
each  partially  excite  their  opposing  detector.  These  three 
concepts  are  sufficient  to  explain  the  main  effects  observed 
in  the  data.  Inasmuch  as  this  model  is  only  tentative,  it 
produces  the  clear  prediction  that  the  placement  of  the 
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subjects*  category  boundaries  along  the  relative  intensity 
(/bae/-/dae)  continuum  is  determined  solely  by  the  relative 
ear  dominances  of  /b/  and  /d/.  This  is  a  prediction  of  the 
model  which  is  partially  substantiated  by  the  present  data, 
and  it  will  be  of  great  interest  to  conduct  detailed 
experiments  to  analyze  this  particular  aspect  of  dichotic 
interactions.  In  this  vein,  it  should  be  noted  that  it  is 
also  possible  to  include  selective  adaptation  into  the 
binaural  model  (trivially  by  substituting  Equations  5-33  for 
5-10)  and  thereby  predict  the  effects  of  adaptation  on  the 
ear  dominance  curves.  This  has  not  yet  been  done,  but  the 
results  should  help  to  clarify  the  nature  of  dichotic 
interactions  as  well  as  those  of  the  adaptation  effects 
themselves  (see  Cooper,  1  974). 

A  major  component  of  the  model  is  the  "decision  plane", 
which  represents  a  two-dimensional  perceptual  signal  space 
in  which  the  outputs  of  the  detectors  are  displayed  (e.  g.. 
Fig.  5.1).  This  is  as  much  a  conceptual  aid  in  visualizing 
the  behaviour  of  the  model  as  it  is  a  statement  of  the 
decision-maXing  process.  With  the  assumption  of  circular 
normal  probability  density  functions  (pdf*s)  in  this  plane, 
all  integrations  of  the  pdf*s  reduce  to  a  single  dimension 
(Abramowitz  and  Stegun,  1967,  p.  956) .  Consequently,  the 
model  does  not  actually  require  postulating  such  a  decision 
plane,  but  doing  so  allows  the  behaviour  of  the  model  in  the 
discrimination,  selective  adaptation,  etc.  paradgims  to  be 
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analyzed  as  geometric  operations  on  the  stimulus 
trajectory-  This  provides  a  measure  of  insight  into  the 
nature  of  the  model  which  cannot  be  obtained  by  viewing  the 
same  operations  in  a  single  dimension. 

In  summary,  this  model  has  an  explanatory  value  which 
transcends  the  dubious  nature  of  some  of  its  assumptions. 

Its  principle  virtue  is  that  it  allows  four  related 
phenomena  -  identification,  discrimination,  selective 
adaptation  and  binaural  fusion  to  be  interpreted  in  terms  of 
a  common  set  of  parameters.  The  additional  assumptions 
which  each  experimental  paradigm  requires  provides  a  certain 
measure  of  insight  into  these  phenomena.  The  model  cannot 
be  considered  completely  successful,  of  course,  but  the 
failures  of  a  model  are  as  important  as  its  successes.  The 
primary  function  of  a  preliminary  model  such  as  this  is  to 
formalize  in  mathematical  terms  many  of  the  proposals  which 
are  proposed  in  the  literature,  and  in  so  doing  clarify  the 
implications  of  these  proposals.  The  advantage  of  having  a 
model  -  any  model  -  is  that  it  enables  future  experiments  to 
be  directed  at  obtaining  information  which  will  resolve  the 
issues  which  the  model  raises. 

7.2  DIBECT I0NS_  FOE  FUBTHEB  BESEABCH 

The  research  described  in  this  thesis  describes  a  new 
experimental  probe  for  speech  perception  studies,  and  the 
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various  experiments  were  conducted  in  an  attempt  to 
understand  the  nature  of  the  probe  itself.  This  is  a 
mandatory  requirement  before  the  expe  timen tal  paradigm  can 
be  used  to  probe  other  aspects  of  speech  perception.  The 
understanding  of  the  phenomenon  of  monaural  fusion  arises 
from  the  model  which  was  constructed  to  explain  the 
experimental  effects.  This  model  possesses  the  required 
behaviour  for  not  unreasonable  values  of  the  parameters,  but 
suffers  from  the  indeter minability  of  some  of  its 
components.  Thus,  future  research  should  be  primarily 
dedicated  to  resolving  the  various  assumptions  which  define 
the  model.  Once  a  successful  model  of  the  monaural  fusion 
paradigm  is  created,  the  model  may  then  be  generalized  to 
other  continua,  and  in  so  doing  may  help  explain  some  of  the 
diverse  phenomena  observed  in  speech  perception  studies.  It 
will  be  especially  advantageous  to  employ  the  same 
experimental  technique  using  simpler  stimuli,  e.g.,  rising 
and  falling  tones,  to  investigate  whether  or  not  the  sharp 
transition  between  categories  is  a  result  of  their  spectro- 
temporal  structure  or  whether  it  is  contingent  on  the 
stimuli  belonging  to  "competing1'  speech  categories. 


Only  four  experimental  paradigms  are  described  in  this 
thesis,  but  others  have  been  investigated  at  an  informal 
level.  The  possibilities  of  the  monaural  fusion  paradigm  as 
a  probe  are  clearly  not  exhausted.  For  instance,  it  is 
possible  to  construct  simultaneous  signal  mixtures  in  more 
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than  one  position  in  a  stimulus.  Two  stimuli,  /baeb/  and 
/daed/,  can  be  mixed  to  produce  /baeb/,  /baed/,  /daeb/  and 
/daed/  and  simultaneous  boundaries  for  the  syllable-initial 
and  syllable-final  consonant  mixtures  can  be  obtained.  A 
recognition  experiment  using  such  stimuli  will  yield 
information  on  how  the  initial  consonants  affect  perception 
of  the  final  consonants,  and  vice  versa.  Furthermore,  a 
selective  adaptation  study  using  such  stimuli  will  produce 
information  on  the  effects  of  syllable-initial  adaptation  on 
syllable-final  consonant  perception  in  a  manner  much  more 
direct  than  that  of,  e.g.,  Ades  (1974). 

Another  possible  experiment  is  to  set  the  stimulus 
components  to  the  boundary  values  for  a  subject,  and  then  to 
selectively  increase  the  intensities  on  various  portions  of 
the  speech  waveform.  The  perceptual  salience  of  that 
acoustic  information  can  then  be  determined  by  observing  the 
effect  on  the  location  of  the  category  boundary. 

Preliminary  experiments  along  this  line  have  been 
conducted.  Varying  the  relative  intensities  of  each  of  the 
pitch  periods  of  the  /b/  and  /d/  waveforms  shows  that  the 
effect  on  the  boundary  is  greatest  for  the  first  pitch 
period,  and  decreases  approximately  exponentially  for  later 
pitch  periods,  a  result  which  would  be  expected  if  the  rate 
of  change  of  formant  frequency  was  an  important  acoustic  cue 
for  /bae/  and  /dae/. 
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Higher  level  judgments  can  be  investigated  using  the 
monaural  fusion  probe.  The  experiments  of  Cutting  (1975; 
Cutting  and  Day,  1975)  concerning  the  effects  of  semantic 
influences  on  the  boundary  placements  can  be  replicated 
using  consonant  mixtures  rather  than  dichotic  presentation 
of  entire  words.  It  will  be  interesting  to  observe  whether 
or  not  the  semantic  plausibility  of  a  phrase  in  any  way 
influences  the  placement  of  the  category  boundary.  If  it 
does,  this  may  provide  evidence  of  anticipatory  "tuning"  of 
the  speech- decoding  system.1 

Lastly,  it  should  be  noted  that  the  "Necker-cube"-like 
phenomena  created  by  monaural  fusion  is  a  phenomenon  of 
interest  for  its  own  sake.  Informal  up-down  tracking 
experiments  show  a  hysteresis  zone  several  dB  in  width, 
which  is  consistent  with  the  limits  of  the  /bae/-/dae/ 
boundary  shown  in  Fig.  3-11.  The  forced  choice 
identification  curve  may  then  only  be  the  result  of 
averaging  these  two  separate  identification  curves,  as  shown 
in  Fig.  7.2.  It  is  possible  to  conduct  this  same  experiment 
in  the  visual  domain  using  a  Necker  cube  or  other  reversible 
figure,  and  similar  hysteresis  effects  should  ne  obtained 


1  Monaural  fusion  does  not  occur  for  all  pcssinle  consonant 
mixtures.  Fhen  /ba/  is  mixed  with  /la/,  for  example,  the 
endpoint  stimuli  are  still  /b/  and  /l/,  but  intermediate 
stimuli  tend  to  have  the  qualities  of  both.  Stimuli  for 
which  the  components  are  approximately  egual  sometimes  have 
the  identity  /bla/,  which  is  similar  to  the  findings  of 
Cutting  (1975) . 
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Fig.  7.2.  Identification  curves  obtained  using  a  sequence 
of  stimuli  either  ascending  or  descending  the 
continuum  (solid  lines  marked  with  arrows).  The 
dashed  line  represents  the  identification  curve 
as  would  be  measured  by  a  forced-choice  test  drawing 
stimuli  randomly  from  the  continuum 
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(cf  Taylor  and  Aldridge,  1974) . 

These  are  but  a  few  of  the  possibilities;  there  are 
many  others.  The  monaural  fusion  paradigm  (with  variation 
of  relative  intensities)  is  a  very  general  technique  which 
can  be  employed  at  the  psychophysical  level  using  simple 
tonal  stimuli,  or  at  higher  levels  using  more  complex  speech 
stimuli.  In  either  case,  the  effects  are  probably 
in ter pretable  at  a  psychophysical  level  and  thus  a  complete 
understanding  of  this  phenomenon  may  help  to  understand  the 
psychophysics  of  speech  perception. 
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